2012-08-27 18:58:09

by Toshi Kani

[permalink] [raw]
Subject: [PATCH] hpwdt: Fix kdump issue in hpwdt

kdump can be interrupted by watchdog timer when the timer is left
activated on the crash kernel. Changed the hpwdt driver to disable
watchdog timer at boot-time. This assures that watchdog timer is
disabled until /dev/watchdog is opened, and prevents watchdog timer
to be left running on the crash kernel.

Signed-off-by: Toshi Kani <[email protected]>
Tested-by: Lisa Mitchell <[email protected]>
Cc: [email protected]
---
drivers/watchdog/hpwdt.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 1eff743..ae60406 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -814,6 +814,9 @@ static int __devinit hpwdt_init_one(struct pci_dev *dev,
hpwdt_timer_reg = pci_mem_addr + 0x70;
hpwdt_timer_con = pci_mem_addr + 0x72;

+ /* Make sure that timer is disabled until /dev/watchdog is opened */
+ hpwdt_stop();
+
/* Make sure that we have a valid soft_margin */
if (hpwdt_change_timer(soft_margin))
hpwdt_change_timer(DEFAULT_MARGIN);
--
1.7.7.6


2012-08-27 19:01:39

by Tom Mingarelli

[permalink] [raw]
Subject: RE: [PATCH] hpwdt: Fix kdump issue in hpwdt

Wim:

I acknowledge and accept this patch.


Thanks,
Tom

-----Original Message-----
From: Kani, Toshimitsu
Sent: Monday, August 27, 2012 1:52 PM
To: [email protected]; [email protected]
Cc: [email protected]; Mingarelli, Thomas; Kani, Toshimitsu; [email protected]
Subject: [PATCH] hpwdt: Fix kdump issue in hpwdt

kdump can be interrupted by watchdog timer when the timer is left
activated on the crash kernel. Changed the hpwdt driver to disable
watchdog timer at boot-time. This assures that watchdog timer is
disabled until /dev/watchdog is opened, and prevents watchdog timer
to be left running on the crash kernel.

Signed-off-by: Toshi Kani <[email protected]>
Tested-by: Lisa Mitchell <[email protected]>
Cc: [email protected]
---
drivers/watchdog/hpwdt.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index 1eff743..ae60406 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -814,6 +814,9 @@ static int __devinit hpwdt_init_one(struct pci_dev *dev,
hpwdt_timer_reg = pci_mem_addr + 0x70;
hpwdt_timer_con = pci_mem_addr + 0x72;

+ /* Make sure that timer is disabled until /dev/watchdog is opened */
+ hpwdt_stop();
+
/* Make sure that we have a valid soft_margin */
if (hpwdt_change_timer(soft_margin))
hpwdt_change_timer(DEFAULT_MARGIN);
--
1.7.7.6

2012-08-27 19:38:32

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: [PATCH] hpwdt: Fix kdump issue in hpwdt

On 2012-08-27T12:52:24, Toshi Kani <[email protected]> wrote:

> kdump can be interrupted by watchdog timer when the timer is left
> activated on the crash kernel. Changed the hpwdt driver to disable
> watchdog timer at boot-time. This assures that watchdog timer is
> disabled until /dev/watchdog is opened, and prevents watchdog timer
> to be left running on the crash kernel.

How does this protect against the system hanging again in the crash
kernel, or possibly hardware caches to flush more data to shared
storage?

(I'm asking from the perspective of the hpwdt being used as a fencing
mechanism in a cluster setting.)

Or is the argument that it's "very unlikely" that a system in such a
state would not make it far enough into the crash kernel?


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imend?rffer, HRB 21284 (AG N?rnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

2012-08-27 19:59:32

by Tom Mingarelli

[permalink] [raw]
Subject: RE: [PATCH] hpwdt: Fix kdump issue in hpwdt

The main issue here is when an NMI comes in (which is hpwdt's main focus...to source NMIs and then panic the box) and the system is configured for kdump. We want the kdump to succeed and if the iLO watchdog timer is left alone to keep running, the kdump will not succeed. It will be interrupted by an ASR. This change ensures that the iLO Watchdog timer is always stopped in the booting case (of any kernel) or when an NMI arrives and we are in the process of taking a kdump.


Tom

-----Original Message-----
From: Lars Marowsky-Bree [mailto:[email protected]]
Sent: Monday, August 27, 2012 2:22 PM
To: Kani, Toshimitsu; [email protected]; [email protected]
Cc: [email protected]; Mingarelli, Thomas; [email protected]
Subject: Re: [PATCH] hpwdt: Fix kdump issue in hpwdt

On 2012-08-27T12:52:24, Toshi Kani <[email protected]> wrote:

> kdump can be interrupted by watchdog timer when the timer is left
> activated on the crash kernel. Changed the hpwdt driver to disable
> watchdog timer at boot-time. This assures that watchdog timer is
> disabled until /dev/watchdog is opened, and prevents watchdog timer
> to be left running on the crash kernel.

How does this protect against the system hanging again in the crash
kernel, or possibly hardware caches to flush more data to shared
storage?

(I'm asking from the perspective of the hpwdt being used as a fencing
mechanism in a cluster setting.)

Or is the argument that it's "very unlikely" that a system in such a
state would not make it far enough into the crash kernel?


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imend?rffer, HRB 21284 (AG N?rnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

2012-08-27 20:50:22

by Toshi Kani

[permalink] [raw]
Subject: RE: [PATCH] hpwdt: Fix kdump issue in hpwdt

On Mon, 2012-08-27 at 19:57 +0000, Mingarelli, Thomas wrote:
> The main issue here is when an NMI comes in (which is hpwdt's main
> focus...to source NMIs and then panic the box) and the system is
> configured for kdump. We want the kdump to succeed and if the iLO
> watchdog timer is left alone to keep running, the kdump will not
> succeed. It will be interrupted by an ASR. This change ensures that
> the iLO Watchdog timer is always stopped in the booting case (of any
> kernel) or when an NMI arrives and we are in the process of taking a
> kdump.

And this change does not prevent running the watchdog daemon on the
crash kernel, if we want to detect a hang condition on the crash kernel.
The timer is re-enabled when /dev/watchdog is opened. The change only
assures the timer is enabled when the daemon starts up. The timer
running on the crash kernel without starting the daemon is a problem as
it leads kdump to be interrupted.

Thanks,
-Toshi


>
> Tom
>
> -----Original Message-----
> From: Lars Marowsky-Bree [mailto:[email protected]]
> Sent: Monday, August 27, 2012 2:22 PM
> To: Kani, Toshimitsu; [email protected]; [email protected]
> Cc: [email protected]; Mingarelli, Thomas; [email protected]
> Subject: Re: [PATCH] hpwdt: Fix kdump issue in hpwdt
>
> On 2012-08-27T12:52:24, Toshi Kani <[email protected]> wrote:
>
> > kdump can be interrupted by watchdog timer when the timer is left
> > activated on the crash kernel. Changed the hpwdt driver to disable
> > watchdog timer at boot-time. This assures that watchdog timer is
> > disabled until /dev/watchdog is opened, and prevents watchdog timer
> > to be left running on the crash kernel.
>
> How does this protect against the system hanging again in the crash
> kernel, or possibly hardware caches to flush more data to shared
> storage?
>
> (I'm asking from the perspective of the hpwdt being used as a fencing
> mechanism in a cluster setting.)
>
> Or is the argument that it's "very unlikely" that a system in such a
> state would not make it far enough into the crash kernel?
>
>
> Regards,
> Lars
>