LinuxLists.cc - Re: [RFC] Watchdog device class

2006-08-15 17:13:39

Subject: Re: [RFC] Watchdog device class

Hi Rudolf,

> > It's not about different approaches: we have to find the best thing for watchdog
> > devicesi, so the best thing is to talk about pro's and con's and see what we should
> > do best. (I for instance didn't come to the sysfs part yet of my code (which would
> > be in watchdog_sysfs.c)
>
> Ok I will look into your code.

Please have a look at the experimental code in the linux-2.6-watchdog-experimental git tree
(http://www.kernel.org/git/?p=linux/kernel/git/wim/linux-2.6-watchdog-experimental.git;a=summary).

It's a first draft of the generic code. (it's split into tree main parts: the core code, the
/dev/watchdog code and the sysfs code). (Note: I used Greg's new "class" device setup).
And it basically uses the following operations and watchdog_device structure:

struct watchdog_ops {
/* mandatory routines */
int (*start)(struct watchdog_device *); /* operation = start watchdog */
int (*stop)(struct watchdog_device *); /* operation = stop watchdog */
int (*keepalive)(struct watchdog_device *); /* operation = send keepalive ping */
/* optional routines */
int (*set_heartbeat)(struct watchdog_device *,int); /* operation = set watchdog's heartbeat */
int (*get_status)(struct watchdog_device *,int *); /* operation = get the watchdog's status */
int (*get_timeleft)(struct watchdog_device *, int *); /* operation = get the time left before reboot */
int (*sys_restart)(struct watchdog_device *); /* operation = force a system_restart for rebooting */
};

struct watchdog_device {
unsigned char name[32]; /* The watchdog's 'identity' */
unsigned long options; /* The supported capabilities/options */
unsigned long firmware; /* The Watchdog's Firmware version */
int heartbeat; /* The watchdog's heartbeat */
int nowayout; /* The nowayout setting for this watchdog */
int bootstatus; /* The watchdog's bootstatus */
int temppanic; /* wether or not to panic on temperature trip's */
struct watchdog_ops *watchdog_ops; /* link to watchdog_ops */

/* watchdog status (register/unregister) state machine */
enum { WATCHDOG_UNINITIALIZED=0,
WATCHDOG_REGISTERED, /* completed register_watchdogdevice */
WATCHDOG_STARTED, /* watchdog device started */
WATCHDOG_STOPPED, /* watchdog device stopped */
WATCHDOG_UNREGISTERED, /* completed unregister_watchdogdevice */
} watchdog_state;

struct semaphore sem; /* locks this structure */

struct device *cdev; /* Sysfs watchdog class device */

/* From here on everything is device dependent */
void *private;
};

There are still a number of things to be added:
* have a list under register_watchdogdevice so that we can register multiple watchdog devices
(Note: only one watchdog device will be linked to /dev/watchdog)
* add pre-timeout stuff
* clean-up sysfs code
* do proper locking
* add documentation
* make docbook ready
* decide how we will handle the old temperature ioctl
...

All comments/suggestions are welcome. Let's see/discuss what's missing, what we should do different, ...

Greetings,
Wim.

2006-09-14 12:19:18

by Rudolf Marek

[permalink] [raw]

Subject: Re: [RFC] Watchdog device class

Hi Wim,

Sorry for the delay I have been quite busy ...

> It's a first draft of the generic code. (it's split into tree main parts: the core code, the
> /dev/watchdog code and the sysfs code). (Note: I used Greg's new "class" device setup).
> And it basically uses the following operations and watchdog_device structure:
>
> struct watchdog_ops {
> /* mandatory routines */
> int (*start)(struct watchdog_device *); /* operation = start watchdog */
> int (*stop)(struct watchdog_device *); /* operation = stop watchdog */
> int (*keepalive)(struct watchdog_device *); /* operation = send keepalive ping */
> /* optional routines */
> int (*set_heartbeat)(struct watchdog_device *,int); /* operation = set watchdog's heartbeat */
> int (*get_status)(struct watchdog_device *,int *); /* operation = get the watchdog's status */
> int (*get_timeleft)(struct watchdog_device *, int *); /* operation = get the time left before reboot */
> int (*sys_restart)(struct watchdog_device *); /* operation = force a system_restart for rebooting */
> };
>
> struct watchdog_device {
> unsigned char name[32]; /* The watchdog's 'identity' */
> unsigned long options; /* The supported capabilities/options */
> unsigned long firmware; /* The Watchdog's Firmware version */
> int heartbeat; /* The watchdog's heartbeat */
> int nowayout; /* The nowayout setting for this watchdog */
> int bootstatus; /* The watchdog's bootstatus */
> int temppanic; /* wether or not to panic on temperature trip's */
> struct watchdog_ops *watchdog_ops; /* link to watchdog_ops */
>
> /* watchdog status (register/unregister) state machine */
> enum { WATCHDOG_UNINITIALIZED=0,
> WATCHDOG_REGISTERED, /* completed register_watchdogdevice */
> WATCHDOG_STARTED, /* watchdog device started */
> WATCHDOG_STOPPED, /* watchdog device stopped */
> WATCHDOG_UNREGISTERED, /* completed unregister_watchdogdevice */
> } watchdog_state;
>
> struct semaphore sem; /* locks this structure */
>
> struct device *cdev; /* Sysfs watchdog class device */
>
> /* From here on everything is device dependent */
> void *private;
> };

Looks good to me. I would second Alan's idea about the standard time structure,
because for embedded stuff even [s] might be too long.

> There are still a number of things to be added:
> * have a list under register_watchdogdevice so that we can register multiple watchdog devices
> (Note: only one watchdog device will be linked to /dev/watchdog)
> * add pre-timeout stuff
> * clean-up sysfs code
> * do proper locking
> * add documentation
> * make docbook ready

I would add:
* infrastructure for too fast or too slow hardware watchdogs.

The idea is following: If the watchdog is too fast (1s timeout like some
winbonds) there could be in watchdog class a common code that would ping this
watchdog often as needed and the real ping from userspace would go to the soft
one... So same as it is now, but part of the class core instead of part of each
driver.

If the design is done correctly, just a time constants could be switched, so the
soft dog is required to be "ping"ed more often then hardware. This would solve
the other problem with 1m hw watchdogs, which are way too "slow".

> * decide how we will handle the old temperature ioctl

Drop? Or there is one easy way. Let the watchdog core call the "private" ioctl
for unknown/temp IOCTL commands - routine ptr would be defined in the structure
above or NULL. We can mark this interface obsolete, and later remove that
private ioctl routine and the field in the watchdog_device structure...

Have you any clue if this IOCTL is used actually by some util? I googled a bit
but without any luck.

As for the temps... Enough should be the registration to hwmon class and
creation of few sysfs callbacks. The interface is stable and well defined. Just
check the Documentation/hwmon/sysfs-interface. I cannot recall if the userspace
app will be able to find out the currect hwmon directory in sysfs from a
watchdog one, but I guess there is somehow assured this class hiearchy...

As for the sysfs attributes: I would suggest to use [ms] for time stuff. Rest
looks quite straightforward.

I'm willing to help with the implementation, just tell me what you have done
already from that list ;)

Sorry for the delay, I hope I will be able to schedule this stuff more often now ;)

Regards
Rudolf