2008-03-11 20:22:28

by Wim Van Sebroeck

[permalink] [raw]
Subject: Re: w83697hf_wdt.c stops watchdog on load

Hi P?draig, Samuel,

> > | I noticed that this driver (which is based on my w83627hf_wdt.c driver)
> > | stops the watchdog when the driver is loaded.
> > |
> > | This seems like the wrong thing to do.
> > | What happens if the watchdog is enabled in the BIOS which is very common,
> > | and the software crashes between the watchdog module being loaded,
> > | and the watchdog being pinged?

Let's take a step back and look at the "API". The watchdog drivers in Linux
were developed when BIOSses only coped with the basic stuff. The API that
was created is basically the following:
* The watchdog device-driver loads your watchdog driver and does the
initialization of the watchdog and then enables user-space access via
/dev/watchdog.
* the userspace watchdog daemon then
1) starts the watchdog by opening /dev/watchdog
2) pings the watchdog (by writing to /dev/watchdog or by using a
ioctl command on /dev/watchdog).
3) stops the watchdog by closing the /dev/watchdog "device".

This means that the /dev/watchdog interface is the key in controlling the
watchdog driver. The default behaviour is and was that the watchdog does not
work/run. Why? Simply because the userspace daemon has to have control
over when the watchdog is started, stopped and pinged and also because you
don't want the system to reboot before the watchdog-userspace-daemon has been
activated. The system operator off-course can decide when he wants to start
the userspace watchdog daemon (before loading other processes or after the
web-server is up because it also monitors the web-server, ...).

Based on the above explanation and API, I can acknowledge that the normal
behaviour is that the watchdog device driver stops the watchdog when loaded.
(Please also bear in mind that certain watchdog devices can only be started
and not stopped).

So the driver has the correct behaviour.

> > During the test phase of the module with users of the French servers
> > at dedibox.fr, I had one report of a bios-enabled watchdog which seemed
> > to be rebooting the machine during a long fsck (the module was compiled
> > in the kernel if I remember correctly).
> >
> > Isn't it a two-edged sword?
>
> True, but one should be able to set things up so that there is no race.
> I.E. if you add the autodisable, it should only be done as an option.
> Best have it =no by default, but as least one could turn this off.

In my opinion it should be the other way around: the default behaviour is to
stop the watchdog and to let userspace (the watchdog-daemon) control the
watchdog. So if we add a module parameter to take over the watchdog's
bios-setting, then the default behaviour should be to stop the watchdog and
add an option that takes the value from the bios.

> A more general question though. Do any other watchdogs do this?
> Seems fundamentally wrong to me. Also aren't all long running operations
> like fsck done in userspace? (where the watchdog process can run in parallel).
> If you really can't get userspace running within 4 mins say, then
> I would suggest that you just disable the watchdog in the BIOS.

Yes, almost all watchdog drivers do this. Do not forget that there are also watchdog
devices around that have a timeout/keepalive value from only a few seconds till
+-60 seconds. so waiting 4 minutes is not do-able for them.

Greetings,
Wim.


2008-03-11 20:28:46

by Samuel Tardieu

[permalink] [raw]
Subject: Re: w83697hf_wdt.c stops watchdog on load

On 11/03, Wim Van Sebroeck wrote:

| > > Isn't it a two-edged sword?
| >
| > True, but one should be able to set things up so that there is no race.
| > I.E. if you add the autodisable, it should only be done as an option.
| > Best have it =no by default, but as least one could turn this off.
|
| In my opinion it should be the other way around: the default behaviour is to
| stop the watchdog and to let userspace (the watchdog-daemon) control the
| watchdog. So if we add a module parameter to take over the watchdog's
| bios-setting, then the default behaviour should be to stop the watchdog and
| add an option that takes the value from the bios.

Hi Wim.

That was my intent: disable the watchdog by default, and add an option
not to disable it (let it untouched, so getting the BIOS setting) at
module initialization, as P?draig request is totally reasonable.

Sam

2008-03-11 21:09:57

by Wim Van Sebroeck

[permalink] [raw]
Subject: Re: w83697hf_wdt.c stops watchdog on load

Hi Sam,

> That was my intent: disable the watchdog by default, and add an option
> not to disable it (let it untouched, so getting the BIOS setting) at
> module initialization, as P?draig request is totally reasonable.

Ok, then we are on the same line. It indeed makes sense to do this
with watchdog devices that are capable of being controlled by the
bios and where the timeout/keepalive time can be set big enough.

What should also be looked at is when you take over the watchdog
settings from the bios, do you then take the same timeout/keepalive
value? In the pcwd-drivers I took the dip-switch value of the card
when the value was 0. You could do something similar here.

Greetings,
Wim.

2008-03-12 11:56:10

by Pádraig Brady

[permalink] [raw]
Subject: Re: w83697hf_wdt.c stops watchdog on load

Samuel Tardieu wrote:
> On 11/03, Wim Van Sebroeck wrote:
>
> | > > Isn't it a two-edged sword?
> | >
> | > True, but one should be able to set things up so that there is no race.
> | > I.E. if you add the autodisable, it should only be done as an option.
> | > Best have it =no by default, but as least one could turn this off.
> |
> | In my opinion it should be the other way around: the default behaviour is to
> | stop the watchdog and to let userspace (the watchdog-daemon) control the
> | watchdog. So if we add a module parameter to take over the watchdog's
> | bios-setting, then the default behaviour should be to stop the watchdog and
> | add an option that takes the value from the bios.
>
> Hi Wim.
>
> That was my intent: disable the watchdog by default

When doing this it would be useful to print a warning iif the watchdog was running.
I.E. the following would be possible for the w83697:

init() {
if (watchdog_already_running) {
printk("warning, stopping watchdog. Use the nodisable option to keep running");
disable_watchdog();
}
}

See the w83627 driver for how to determine watchdog is running.

>, and add an option
> not to disable it (let it untouched, so getting the BIOS setting) at
> module initialization, as P?draig request is totally reasonable.

cool, thanks.

P?draig.

p.s. I'm still not sure it should default to turning off.
It would be unusual for userspace to take over 60s to _start_
If that was the case then the user shouldn't enable the watchdog in the BIOS at all.

2008-03-12 12:00:48

by Samuel Tardieu

[permalink] [raw]
Subject: Re: w83697hf_wdt.c stops watchdog on load

On 12/03, P?draig Brady wrote:

| When doing this it would be useful to print a warning iif the watchdog
| was running.

Agreed.

| p.s. I'm still not sure it should default to turning off.
| It would be unusual for userspace to take over 60s to _start_
| If that was the case then the user shouldn't enable the watchdog in the
| BIOS at all.

Most users I know using this watchdog are using it on an hosted
dedicated server, for which the boot messages are not available.
Aborting because of the watchdog triggering during fsck would not be
easily spottable. I prefer to err on the safe side from the user point
of view, even though I guess what you see as being "on the safe side"
would be to keep the watchdog enabled :)

Sam