2010-02-23 15:40:28

by Simon Kagstrom

[permalink] [raw]
Subject: [PATCH] iTCO_wdt: Don't stop on shutdown with nowayout

Currently, the watchdog is turned off when the system shuts down or the
module is unloaded. If nowayout has been selected, this makes no sense
and fails to restart the system if it hangs during reboot, so make it
conditional.

Signed-off-by: Simon Kagstrom <[email protected]>
---
We have a system which has such a hang, and therefore want the watchdog
to be on until the bitter end.

drivers/watchdog/iTCO_wdt.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index 4bdb7f1..927df26 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -839,7 +839,8 @@ static int __devexit iTCO_wdt_remove(struct platform_device *dev)

static void iTCO_wdt_shutdown(struct platform_device *dev)
{
- iTCO_wdt_stop();
+ if (!nowayout)
+ iTCO_wdt_stop();
}

#define iTCO_wdt_suspend NULL
--
1.6.0.4


2010-02-23 16:29:57

by Pádraig Brady

[permalink] [raw]
Subject: Re: [PATCH] iTCO_wdt: Don't stop on shutdown with nowayout

On 23/02/10 15:40, Simon Kagstrom wrote:
> Currently, the watchdog is turned off when the system shuts down or the
> module is unloaded. If nowayout has been selected, this makes no sense
> and fails to restart the system if it hangs during reboot, so make it
> conditional.
>
> Signed-off-by: Simon Kagstrom<[email protected]>
> ---
> We have a system which has such a hang, and therefore want the watchdog
> to be on until the bitter end.
>
> drivers/watchdog/iTCO_wdt.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
> index 4bdb7f1..927df26 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -839,7 +839,8 @@ static int __devexit iTCO_wdt_remove(struct platform_device *dev)
>
> static void iTCO_wdt_shutdown(struct platform_device *dev)
> {
> - iTCO_wdt_stop();
> + if (!nowayout)
> + iTCO_wdt_stop();
> }
>
> #define iTCO_wdt_suspend NULL

I see the issue, however what happens if you're
rebooting into a system that doesn't then renable the watchdog.
I've seen systems where the hardware watchdog is not reset
during the reboot process, in which case you'll get a
reboot while running the other system.

If you had a readonly system, then perhaps you
can just WDIOC_SETTIMEOUT the hardware watchdog timeout to 1s
and wait for it to reboot the system?

cheers,
P?draig.

2010-02-23 17:14:31

by Pádraig Brady

[permalink] [raw]
Subject: [PATCH] iTCO_wdt: Don't double the requested timeout

Actually looking at that code I noticed that it wasn't
accounting for the timer counting down twice before reboot,
which I thought was the case for ICH4 at least.
The following is not even compiled, nor am I sure it
applies to TCO v2. Testing/info appreciated.
I might be able to dig out an ICH4 system at some stage.

cheers,
P?draig.

--- a/iTCO_wdt.c 2009-06-10 03:05:27.000000000 +0000
+++ b/iTCO_wdt.c 2010-02-23 17:02:07.829640740 +0000
@@ -274,7 +274,7 @@
static int heartbeat = WATCHDOG_HEARTBEAT; /* in seconds */
module_param(heartbeat, int, 0);
MODULE_PARM_DESC(heartbeat, "Watchdog heartbeat in seconds. "
- "(2<heartbeat<39 (TCO v1) or 613 (TCO v2), default="
+ "(4<heartbeat<78 (TCO v1) or 1226 (TCO v2), default="
__MODULE_STRING(WATCHDOG_HEARTBEAT) ")");

static int nowayout = WATCHDOG_NOWAYOUT;
@@ -290,8 +290,8 @@
static inline unsigned int seconds_to_ticks(int seconds)
{
/* the internal timer is stored as ticks which decrement
- * every 0.6 seconds */
- return (seconds * 10) / 6;
+ * every 0.6 seconds. The timer counts down twice before reboot */
+ return (seconds * 10) / 3;
}

static void iTCO_wdt_set_NO_REBOOT_bit(void)
@@ -721,8 +721,8 @@
if (iTCO_wdt_set_heartbeat(heartbeat)) {
iTCO_wdt_set_heartbeat(WATCHDOG_HEARTBEAT);
printk(KERN_INFO PFX
- "heartbeat value must be 2 < heartbeat < 39 (TCO v1) "
- "or 613 (TCO v2), using %d\n", heartbeat);
+ "heartbeat value must be 4 < heartbeat < 78 (TCO v1) "
+ "or 1226 (TCO v2), using %d\n", heartbeat);
}

ret = misc_register(&iTCO_wdt_miscdev);

2010-02-24 06:25:49

by Simon Kagstrom

[permalink] [raw]
Subject: Re: [PATCH] iTCO_wdt: Don't stop on shutdown with nowayout

On Tue, 23 Feb 2010 16:24:52 +0000
> > + if (!nowayout)
> > + iTCO_wdt_stop();
> > }
> >
> > #define iTCO_wdt_suspend NULL
>
> I see the issue, however what happens if you're
> rebooting into a system that doesn't then renable the watchdog.
> I've seen systems where the hardware watchdog is not reset
> during the reboot process, in which case you'll get a
> reboot while running the other system.

Well, in that case I would run without nowayout. I just think the
behavior is a bit strange if the watchdog is turned off at all if we
have nowayout set.

Thanks for your suggestion though!

// Simon

2010-02-24 09:16:27

by Simon Kagstrom

[permalink] [raw]
Subject: Re: [PATCH] iTCO_wdt: Don't double the requested timeout

On Tue, 23 Feb 2010 17:09:26 +0000
P?draig Brady <[email protected]> wrote:

> Actually looking at that code I noticed that it wasn't
> accounting for the timer counting down twice before reboot,
> which I thought was the case for ICH4 at least.
> The following is not even compiled, nor am I sure it
> applies to TCO v2. Testing/info appreciated.

I tested your change on our TCO v1-based board, and it doubles the time
until the watchdog triggers. So:

Tested-by: Simon Kagstrom <[email protected]>

2010-02-24 16:23:22

by Pádraig Brady

[permalink] [raw]
Subject: Re: [PATCH] iTCO_wdt: Don't double the requested timeout

On 24/02/10 09:16, Simon Kagstrom wrote:
> On Tue, 23 Feb 2010 17:09:26 +0000
> Pádraig Brady<[email protected]> wrote:
>
>> Actually looking at that code I noticed that it wasn't
>> accounting for the timer counting down twice before reboot,
>> which I thought was the case for ICH4 at least.
>> The following is not even compiled, nor am I sure it
>> applies to TCO v2. Testing/info appreciated.
>
> I tested your change on our TCO v1-based board, and it doubles the time
> until the watchdog triggers. So:

Well it should be halving the timeout :)
I amended the patch and Simon retested to verify
that it now honors the requested timeout.

I also checked an ICH7 box here and it doesn't
seem to need the adjustment, so I've amended the patch accordingly.

Wim, please apply, thanks...


iTCO_wdt: fix TCO V1 timeout values and limits

For TCO V1 devices the programmed timeout was twice too long
because the fact that the TCO V1 timer needs to count down
twice before triggering the watchdog, wasn't accounted for.
Also the timeout values in the module description and error
message were clarified.

Signed-off-by: Pádraig Brady <[email protected]>
Tested-by: Simon Kagstrom <[email protected]>

--- a/iTCO_wdt.c 2010-02-24 13:43:54.815676860 +0000
+++ b/iTCO_wdt.c 2010-02-24 15:18:33.341640363 +0000
@@ -304,8 +304,8 @@
#define WATCHDOG_HEARTBEAT 30 /* 30 sec default heartbeat */
static int heartbeat = WATCHDOG_HEARTBEAT; /* in seconds */
module_param(heartbeat, int, 0);
-MODULE_PARM_DESC(heartbeat, "Watchdog heartbeat in seconds. "
- "(2<heartbeat<39 (TCO v1) or 613 (TCO v2), default="
+MODULE_PARM_DESC(heartbeat, "Watchdog timeout in seconds. "
+ "(5..76 (TCO v1) or 3..614 (TCO v2), default="
__MODULE_STRING(WATCHDOG_HEARTBEAT) ")");

static int nowayout = WATCHDOG_NOWAYOUT;
@@ -321,8 +321,12 @@
static inline unsigned int seconds_to_ticks(int seconds)
{
/* the internal timer is stored as ticks which decrement
- * every 0.6 seconds */
- return (seconds * 10) / 6;
+ * every 0.6 seconds. For TCO v1 the timer counts down
+ * twice before triggering the watchdog */
+ if (iTCO_wdt_private.iTCO_version == 1)
+ return (seconds * 5) / 6;
+ else
+ return (seconds * 10) / 6;
}

static void iTCO_wdt_set_NO_REBOOT_bit(void)
@@ -756,9 +760,14 @@
if not reset to the default */
if (iTCO_wdt_set_heartbeat(heartbeat)) {
iTCO_wdt_set_heartbeat(WATCHDOG_HEARTBEAT);
+ int tco_min=5; int tco_max=76; /* TCO V1 */
+ if (iTCO_wdt_private.iTCO_version == 2) {
+ tco_min=3; tco_max=614;
+ }
printk(KERN_INFO PFX
- "heartbeat value must be 2 < heartbeat < 39 (TCO v1) "
- "or 613 (TCO v2), using %d\n", heartbeat);
+ "timeout value %d is not between %d and %d"
+ " inclusive, using %d\n",
+ heartbeat, tco_min, tco_max, WATCHDOG_HEARTBEAT);
}

ret = misc_register(&iTCO_wdt_miscdev);

2010-02-25 07:45:42

by Simon Kagstrom

[permalink] [raw]
Subject: Re: [PATCH] iTCO_wdt: Don't double the requested timeout

(taking back lkml on CC. The discussion is about stopping watchdogs
before rebooting).

On Wed, 24 Feb 2010 16:10:00 +0000
P?draig Brady <[email protected]> wrote:
> > Returning to the initial issue (my patch to avoid stopping the watchdog
> > before reboot): What is the preferred behavior? I've looked in other
> > drivers, and see multiple ways being used. Some do as in my patch, some
> > leave it on unconditionally and some stop it unconditionally.
> >
> Well nowayout to me means userspace should have no way out,
> but when rebooting the system the watchdog should be reset.
> But in saying that I'm not sure what to do. At least there
> should be some way to select the operation you want above,
> so as to protect the reboot process itself.

Well, from the drives I saw that had this behavior (not that I checked
all of them), they did look at nowayout to determine whether to stop it
or not.

> In general, I wonder could an order be specified so that
> the watchdog is disabled as the very last thing by the kernel,
> right before it does the reboot?

Many other drivers use reboot notifiers, but unfortunately it seems
that these are called before device shutdown (kernel/sys.c), so it
wouldn't help here.


I guess it would be good to have defined and uniform behavior across
different watchdogs, and at least an option to specify
nowayout-also-when-rebooting.

// Simon

2010-04-07 16:27:39

by Pádraig Brady

[permalink] [raw]
Subject: Re: [PATCH] iTCO_wdt: Don't double the requested timeout

On 24/02/10 16:18, Pádraig Brady wrote:
> On 24/02/10 09:16, Simon Kagstrom wrote:
>> On Tue, 23 Feb 2010 17:09:26 +0000
>> Pádraig Brady<[email protected]> wrote:
>>
>>> Actually looking at that code I noticed that it wasn't
>>> accounting for the timer counting down twice before reboot,
>>> which I thought was the case for ICH4 at least.
>>> The following is not even compiled, nor am I sure it
>>> applies to TCO v2. Testing/info appreciated.
>>
>> I tested your change on our TCO v1-based board, and it doubles the time
>> until the watchdog triggers. So:
>
> Well it should be halving the timeout :)
> I amended the patch and Simon retested to verify
> that it now honors the requested timeout.
>
> I also checked an ICH7 box here and it doesn't
> seem to need the adjustment, so I've amended the patch accordingly.
>
> Wim, please apply, thanks...

In further testing it was seen that the "timer status" bit
needs to be cleared at each pat of the watchdog so as
to support timeouts in the 34s to 76s range.
This was done with: outb (0x08, TCO1_STS);
The updated patch is below.

cheers,
Pádraig.

iTCO_wdt: fix TCO V1 timeout values and limits

For TCO V1 devices the programmed timeout was twice too long
because the fact that the TCO V1 timer needs to count down
twice before triggering the watchdog, wasn't accounted for.
Also the timeout values in the module description and error
message were clarified.

Signed-off-by: Pádraig Brady <[email protected]>
Tested-by: Simon Kagstrom <[email protected]>

--- a/iTCO_wdt.c 2010-04-06 15:00:41.000000000 +0000
+++ b/iTCO_wdt.c 2010-04-07 16:08:27.000000000 +0000
@@ -391,8 +391,8 @@
#define WATCHDOG_HEARTBEAT 30 /* 30 sec default heartbeat */
static int heartbeat = WATCHDOG_HEARTBEAT; /* in seconds */
module_param(heartbeat, int, 0);
-MODULE_PARM_DESC(heartbeat, "Watchdog heartbeat in seconds. "
- "(2<heartbeat<39 (TCO v1) or 613 (TCO v2), default="
+MODULE_PARM_DESC(heartbeat, "Watchdog timeout in seconds. "
+ "5..76 (TCO v1) or 3..614 (TCO v2), default="
__MODULE_STRING(WATCHDOG_HEARTBEAT) ")");

static int nowayout = WATCHDOG_NOWAYOUT;
@@ -408,8 +408,12 @@
static inline unsigned int seconds_to_ticks(int seconds)
{
/* the internal timer is stored as ticks which decrement
- * every 0.6 seconds */
- return (seconds * 10) / 6;
+ * every 0.6 seconds. For TCO v1 the timer counts down
+ * twice before triggering the watchdog */
+ if (iTCO_wdt_private.iTCO_version == 1)
+ return (seconds * 5) / 6;
+ else
+ return (seconds * 10) / 6;
}

static void iTCO_wdt_set_NO_REBOOT_bit(void)
@@ -521,10 +525,15 @@
iTCO_vendor_pre_keepalive(iTCO_wdt_private.ACPIBASE, heartbeat);

/* Reload the timer by writing to the TCO Timer Counter register */
- if (iTCO_wdt_private.iTCO_version == 2)
- outw(0x01, TCO_RLD);
- else if (iTCO_wdt_private.iTCO_version == 1)
+ if (iTCO_wdt_private.iTCO_version == 1) {
+ /* Reset the timeout status bit so that the timer
+ * needs to count down twice again before rebooting */
+ outb (0x08, TCO1_STS); /* write 1 to clear bit */
+
outb(0x01, TCO_RLD);
+ } else if (iTCO_wdt_private.iTCO_version == 2)
+ outw(0x01, TCO_RLD);
+ }

spin_unlock(&iTCO_wdt_private.io_lock);
return 0;
@@ -843,9 +852,14 @@
if not reset to the default */
if (iTCO_wdt_set_heartbeat(heartbeat)) {
iTCO_wdt_set_heartbeat(WATCHDOG_HEARTBEAT);
+ int tco_min=5; int tco_max=76; /* TCO V1 */
+ if (iTCO_wdt_private.iTCO_version == 2) {
+ tco_min=3; tco_max=614;
+ }
printk(KERN_INFO PFX
- "heartbeat value must be 2 < heartbeat < 39 (TCO v1) "
- "or 613 (TCO v2), using %d\n", heartbeat);
+ "timeout value %d is not between %d and %d"
+ " inclusive, using %d\n",
+ heartbeat, tco_min, tco_max, WATCHDOG_HEARTBEAT);
}

ret = misc_register(&iTCO_wdt_miscdev);