2002-09-24 19:43:39

by Jeff Garzik

[permalink] [raw]
Subject: alternate event logging proposal

Here's my suggestion, which will maintain compatibility with 2.2 and 2.4
kernels, and require a -very- minimal kernel update, to support event
logging.

When CONFIG_EVLOG is defined, printk outputs a tagged ASCII format:
KERN_xxx, the format string (verbatim), and 0 or more argument values.
Since this loses the argument tags (names) from the current IBM event
logging code, you can re-gain this info by post-processing the printk.
Post-processing can be done by a hacked cpp (see http://www.tinycc.org/
for a tiny, steal-able one) or a simple Perl script.

The only real difference at that point between IBM's implementation and
my proposal is that some argument tags are expressions, not simple C
identifiers (flag ? "true" : "false") instead of "flag". This is a
problem if you want to do SQLish queries on the tags in the event log,
but its a work-around-able problem, IMO.

This scheme should fit with the backend code already created by IBM,
which isn't too bad IMO.

Even if the details are disagreeable, I really think that we should work
towards making printk (or 'warn', 'error', etc.) log the desired
information, while still keeping older drivers useful, and keeping
drivers source-compatible with older kernels.


Now, turning to a tangent topic that relates to either scheme...

With either your proposal or mine, event logging is far more useful if
similar drivers spit out similar diagnostics. i.e. it's less useful if
8139too net driver spits out 'status16' in one interrupt event, and
8139cp net driver spits out 'status32' in another. Though they are
different hardware and the values mean different things, my point is the
concepts are similar, and thus better diagnostics are achieved with
subsystem diagnostic standards.

Such standards are in actuality independent of event logging per se, but
if IBM wants to push this thing, I would like to see some proposals as
to what IBM actually wants drivers to log. I have not seen that at all,
and think that such proposals should be an integral part of an event
logging system. Otherwise the diagnostics are less useful, and IBM
would have failed to demonstrate an adequate grasp of the problem domain
[which then leads to other, typical software engineering problems...]

"What do you want to log?" is as important to me as "how do you want to
log it?" And the answers to the two questions are very much intertwined.

Comments?

Jeff




2002-09-24 19:53:31

by Chris Friesen

[permalink] [raw]
Subject: Re: alternate event logging proposal

Jeff Garzik wrote:

> "What do you want to log?" is as important to me as "how do you want to
> log it?" And the answers to the two questions are very much intertwined.

Also related is "how can userspace be notified of kernel events?".
There is no way for a userspace app to be notified that, for instance,
an ATM device got a loss of signal. The drivers print it out, but the
userspace app has no clue.

I think that there should be a relatively generic way for drivers to
distribute events such as this and for userspace to register interest in
them.

Maybe netlink is the way to go, but its not exactly a simple interface.

Chris

2002-09-24 19:58:31

by Jeff Garzik

[permalink] [raw]
Subject: Re: alternate event logging proposal

Chris Friesen wrote:
> Also related is "how can userspace be notified of kernel events?". There
> is no way for a userspace app to be notified that, for instance, an ATM
> device got a loss of signal. The drivers print it out, but the
> userspace app has no clue.

> Maybe netlink is the way to go, but its not exactly a simple interface.


Well, the eventual intention [whenever <somebody> codes it up] for
ethernet devices is to have netlink proactively deliver link up/down
events, and perhaps other events. For this specific problem, at least,
I think netlink is the way to go. ATM drivers, or ethernet drivers,
simply need some helper functions they can call. The library code for
those helper functions is the only place where netlink complexity is
needed...

In existing drivers that call netif_carrier_{on,off}, it is perhaps even
possible to have them send netlink messages with no driver-specific code
changes at all.

Jeff



2002-09-24 20:04:41

by Jeff Garzik

[permalink] [raw]
Subject: Re: alternate event logging proposal

Chris Friesen wrote:
> Also related is "how can userspace be notified of kernel events?". There
> is no way for a userspace app to be notified that, for instance, an ATM
> device got a loss of signal. The drivers print it out, but the
> userspace app has no clue.


(sorry for the second reply)

To address your more general point, a general way to notify interested,
credentialed (is that a word?) 3rd party processes of device events
would indeed be useful. Since such events are essential out-of-band
info, netlink might indeed be applicable.

Jeff



2002-09-24 20:49:55

by Daniel Stekloff

[permalink] [raw]
Subject: Re: [evlog-dev] alternate event logging proposal

On Tuesday 24 September 2002 12:48 pm, Jeff Garzik wrote:

[snip)

> Now, turning to a tangent topic that relates to either scheme...
>
> With either your proposal or mine, event logging is far more useful if
> similar drivers spit out similar diagnostics. i.e. it's less useful if
> 8139too net driver spits out 'status16' in one interrupt event, and
> 8139cp net driver spits out 'status32' in another. Though they are
> different hardware and the values mean different things, my point is the
> concepts are similar, and thus better diagnostics are achieved with
> subsystem diagnostic standards.
>
> Such standards are in actuality independent of event logging per se, but
> if IBM wants to push this thing, I would like to see some proposals as
> to what IBM actually wants drivers to log. I have not seen that at all,
> and think that such proposals should be an integral part of an event
> logging system. Otherwise the diagnostics are less useful, and IBM
> would have failed to demonstrate an adequate grasp of the problem domain
> [which then leads to other, typical software engineering problems...]
>
> "What do you want to log?" is as important to me as "how do you want to
> log it?" And the answers to the two questions are very much intertwined.
>
> Comments?


I agree with you that there needs to be a strategy for what is logged by a
driver and how it is logged. We believe log analysis is an essential part of
diagnosing errors. Log messages, when generated consistently, could indicate
what drivers were loaded, when they were loaded, what their current version
is, and what errors they have encountered. How the messages are formatted,
whether they follow certain rules, can greatly aid User Space diagnostic
applications.

We propose standardizing what should be logged and how those log messages
should look.

What:

- Drivers should log initialization and uninitialization information for both
drivers and devices. Knowning when a driver is loaded or unloaded is useful
information.

- Initialization information should include name of the device, name of the
driver, and the current version or release levels. Any errors encountered
during initialization - e.g. from running self tests - should be properly
logged.

- Errors during operation should be logged.

How:

- Messages should be human readable.

- Messages should be succinct.

- Messages should uniquely identify the driver and the device, such as many
drivers already do by prefixing messages with the driver name.

- Beware of log pollution - log only relevant information. Avoid frequently
recurring messages.

- Use defined severity levels.

- Make sure that consitent logging and terminology is used throughout the
driver.

Most of what I just outlined is common sense, but it's worth stating.

Any comments?


Dan



> Jeff
>
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> evlog-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/evlog-developers

2002-09-25 01:33:24

by Tim Hockin

[permalink] [raw]
Subject: Re: alternate event logging proposal

> To what level would you see this going?

Not sure - someone talked about receiving all sorts of device events.
Certainly do-able, but it may be desirable to really limit the domain.

> I'm currently doing some documentation work on the input subsystem, and it
> produces events (/dev/input/eventX) for every mouse movement, every key press
> (and release), etc. Now most of the application interested in those events

So I'd say this level of heavy traffic probably deserves it's own event
mechanism that is more lightweight than eventd. The more generic it is, the
more each event costs (currently O(n) - 1 regex comparison for each
client/config). Putting high volume mouse/kb events would likely suck.

> > True, and if a dev_event file were created, I'd consider doing it that way.
> > But in that case it's easier for apps to talk to eventd (nee acpid) and get
> > only the messages they want.

> I think that the eventd advantage is that it is easy to do integration with
> non-event aware apps. Example: eventd re-writes the config file and SIGHUPs
> the application.

This eventd I'm speaking of is what currently exists as acpid. It reads
ACPI events and does just that - runs some user-defined handlers and alerts
processes on a UNIX socket.

It does make it possible to do things like not modify DHCP client at all,
and just thump it with a HUP. It also means everyone needs to be running
eventd. And then you still have the 'need-an-edge' problem. If DHCP starts
and hangs because it can't find the network (assume an unplugged NIC),
nothing is going to come along and tell it there is no link because there
was no edge.

By making something like (just for argument) /netdevfs/eth0/link, dhcp could
read that, find a "0" and go to sleep on a poll() of that same fd. But you
need to modify DHCP.

Advantages and disadvantages to all models. Which is why I brought it up
for debate. IFF it is even attainable for 2.5.x. It certainly is for just
netif style events. On a broader scope? Maybe not. Another drawback of
the generic event file is that you now have a lot more people to please with
a generic enough protocol.

Tim

2002-09-25 01:16:26

by Brad Hards

[permalink] [raw]
Subject: Re: alternate event logging proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 25 Sep 2002 10:47, Tim Hockin wrote:
> > > a single device_event file that a daemon reads and dispatches events (I
> > > like this one because the daemon is already written, just poorly named
> > > - acpid)
> >
> > Couldn't you just have the message sent to every process that has
> > opened the file (and have every interested process open the file and
> > read it in a non-blocking or blocking mode?)
>
> Sure, but then every process that is concerned with a single event has to
> not only receive every event, but parse every event. And if this is to be
> truly generic, that could be a lot of events.
To what level would you see this going?
I'm currently doing some documentation work on the input subsystem, and it
produces events (/dev/input/eventX) for every mouse movement, every key press
(and release), etc. Now most of the application interested in those events
will get them via X (we just need to interface the input subsystem event
interface to the X event interface). I see this as a separate daemon or
daemons.
Basically you get eventd on every system, and inputd for each console (in a
multiheaded, multiuser setup).

> > That seems to negate the need for something like acpid, but it does
> > not preclude it's use.
>
> True, and if a dev_event file were created, I'd consider doing it that way.
> But in that case it's easier for apps to talk to eventd (nee acpid) and get
> only the messages they want.
I think that the eventd advantage is that it is easy to do integration with
non-event aware apps. Example: eventd re-writes the config file and SIGHUPs
the application.

Brad

- --
http://conf.linux.org.au. 22-25Jan2003. Perth, Aust. Tickets booked.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE9kQ4NW6pHgIdAuOMRAtEyAKC2t5lKponBvUHH14bONYfjbSWFxgCeMh1O
9pjS0UjK627edrI8WJDBXp0=
=V0sd
-----END PGP SIGNATURE-----

2002-09-25 00:43:17

by Tim Hockin

[permalink] [raw]
Subject: Re: alternate event logging proposal

> > a single device_event file that a daemon reads and dispatches events (I
> > like this one because the daemon is already written, just poorly named -
> > acpid)
>
> Couldn't you just have the message sent to every process that has
> opened the file (and have every interested process open the file and
> read it in a non-blocking or blocking mode?)

Sure, but then every process that is concerned with a single event has to
not only receive every event, but parse every event. And if this is to be
truly generic, that could be a lot of events.

> That seems to negate the need for something like acpid, but it does
> not preclude it's use.

True, and if a dev_event file were created, I'd consider doing it that way.
But in that case it's easier for apps to talk to eventd (nee acpid) and get
only the messages they want.

Tim

2002-09-24 23:54:15

by Tim Hockin

[permalink] [raw]
Subject: Re: alternate event logging proposal

Brad Hards wrote:

>>You really want something where a userspace app can sleep on an fd, to
>>be awakened when link changes (or some other interesting event occurs)
>
> Maybe - I've been thinking of a "hotplug" daemon, that can take notifications
> from the kernel _and_ from other userspace apps. The integrated solution
> somehow needs to incorporate device hotplugging (eg USB, PCI), network device
> events (netlink), userspace reconfiguration (eg X colour depth and
> resolution) and maybe network infrastructure (external to the machine,
> probably SLPv2 or similar), and reconfigure kernel and applications to match.

See my previous about acpid - it is capable of most of this. In short:

Open kernel event file
read config files: map regexes to actions
open a named UNIX socket
while 1
wait for event or data on socket
if it's an event {
read event
for each config'ed regex {
if it matches this event
run the associated action
}
for each UNIX connection {
notify the connection of the event
}
} else if it's a connection on the socket {
add the connection to the list of notifications
}
}


Now it would be easy to make UNIX-connecting apps specify one or more
regexes, instead of getting broadcasted. It would be similarly easy to
make it read multiple sources and handle that - acpi, dev_events,
user_events (UNIX socket or FIFO).

http://acpid.sourceforge.net - it's kind of stale, because it does
everything it needs to do for now :) It's small, well tested and easy
to understand. Best of all, it's already written.

Tim
--
Tim Hockin
Systems Software Engineer
Sun Microsystems, Linux Kernel Engineering
[email protected]

2002-09-25 00:05:45

by Ben Greear

[permalink] [raw]
Subject: Re: alternate event logging proposal

Tim Hockin wrote:

> a single device_event file that a daemon reads and dispatches events (I
> like this one because the daemon is already written, just poorly named -
> acpid)

Couldn't you just have the message sent to every process that has
opened the file (and have every interested process open the file and
read it in a non-blocking or blocking mode?)

That seems to negate the need for something like acpid, but it does
not preclude it's use.

>
> For things like netif_carrier, poll() is probably best - the DHCP client
> can be fully self contained, and not need an eventd to alert it to a
> signal change. Of course, acpid does support UNIX socket connections
> from apps like DHCP....

I would prefer an event, and it can still be self contained if my
idea above is actually workable.

Ben

>
>
>


--
Ben Greear <[email protected]> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear


2002-09-24 23:38:51

by Brad Hards

[permalink] [raw]
Subject: Re: alternate event logging proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 25 Sep 2002 09:31, Jeff Garzik wrote:
> Brad Hards wrote:
> > I liked the /sbin/hotplug arrangement (aka call_usermode_helper). In
> > fact, my plan was to add the call_usermode_helper call to the
> > netif_carrier_[on,off] functions. Unfortuantely, I've been to too many of
> > Rusty's talks, and know that calling a function that is only safe in user
> > context is unlikely to be a good idea in netif_carrier_[on,off], which
> > are more than likely running in interrupt context.
>
> You really want something where a userspace app can sleep on an fd, to
> be awakened when link changes (or some other interesting event occurs)
Maybe - I've been thinking of a "hotplug" daemon, that can take notifications
from the kernel _and_ from other userspace apps. The integrated solution
somehow needs to incorporate device hotplugging (eg USB, PCI), network device
events (netlink), userspace reconfiguration (eg X colour depth and
resolution) and maybe network infrastructure (external to the machine,
probably SLPv2 or similar), and reconfigure kernel and applications to match.

Brad
- --
http://conf.linux.org.au. 22-25Jan2003. Perth, Aust. Tickets booked.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD4DBQE9kPcwW6pHgIdAuOMRAoaDAJ9PnK962eJCuKdobU64SfY/2SRemQCYxSUS
CfTiTN9hOq+gfldzcgDzCQ==
=bDjx
-----END PGP SIGNATURE-----

2002-09-24 23:27:00

by Jeff Garzik

[permalink] [raw]
Subject: Re: alternate event logging proposal

Brad Hards wrote:
> I liked the /sbin/hotplug arrangement (aka call_usermode_helper). In fact, my
> plan was to add the call_usermode_helper call to the netif_carrier_[on,off]
> functions. Unfortuantely, I've been to too many of Rusty's talks, and know
> that calling a function that is only safe in user context is unlikely to be a
> good idea in netif_carrier_[on,off], which are more than likely running in
> interrupt context.


You really want something where a userspace app can sleep on an fd, to
be awakened when link changes (or some other interesting event occurs)

Jeff



2002-09-24 23:34:14

by Tim Hockin

[permalink] [raw]
Subject: Re: alternate event logging proposal

Jeff Garzik wrote:
> Brad Hards wrote:
>
>> I liked the /sbin/hotplug arrangement (aka call_usermode_helper). In
>> fact, my plan was to add the call_usermode_helper call to the
>> netif_carrier_[on,off] functions. Unfortuantely, I've been to too many
>> of Rusty's talks, and know that calling a function that is only safe
>> in user context is unlikely to be a good idea in
>> netif_carrier_[on,off], which are more than likely running in
>> interrupt context.
>
>
>
> You really want something where a userspace app can sleep on an fd, to
> be awakened when link changes (or some other interesting event occurs)

I tend to agree - I like either of the models:

a bunch of little single-value files that can be polled and read

or

a single device_event file that a daemon reads and dispatches events (I
like this one because the daemon is already written, just poorly named -
acpid)

For things like netif_carrier, poll() is probably best - the DHCP client
can be fully self contained, and not need an eventd to alert it to a
signal change. Of course, acpid does support UNIX socket connections
from apps like DHCP....



--
Tim Hockin
Systems Software Engineer
Sun Microsystems, Linux Kernel Engineering
[email protected]

2002-09-24 22:34:05

by Brad Hards

[permalink] [raw]
Subject: Re: alternate event logging proposal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 25 Sep 2002 06:54, Tim Hockin wrote:
> Jeff Garzik wrote:
> > In existing drivers that call netif_carrier_{on,off}, it is perhaps even
> > possible to have them send netlink messages with no driver-specific code
> > changes at all.
>
> This is something that I have been asked to look at, here. Jeff, how
> (or is?) any of the netlink info pushed up to userspace? The idea that
> someone came to me with was to have something in (driverfs? netdevfs?)
> that was poll()able and read()able. read() giving current state, and
> poll() waking on changes. Or maybe two different files, but something.
> Of course it'd be greate to be generic. I just assumed it would come
> from netif_* for netdevices.
>
> Is this something planned? wanted? something I should bang out into
> 2.5.x before end of next month?
Wanted.
Example: In desktop / consumer applications, you can use link-state change to
do things like invoke the DHCP client daemon, or get yourself a link-local IP
address.

> We could have a generic device-events file (akin to acpi events) that a
> daemon dispatches events into user-land, or we could have a kernel->user
> callback a la /sbin/hotplug, or we could have many device/subsys
> specific files.
>
> Anyone have a preference?
I liked the /sbin/hotplug arrangement (aka call_usermode_helper). In fact, my
plan was to add the call_usermode_helper call to the netif_carrier_[on,off]
functions. Unfortuantely, I've been to too many of Rusty's talks, and know
that calling a function that is only safe in user context is unlikely to be a
good idea in netif_carrier_[on,off], which are more than likely running in
interrupt context.

Conceptually, I don't see (hot-plugging) a CAT-5 cable into a NIC to be that
much different (from a userland view) to plugging the NIC into the PCI bus.

My big problem is that I am clueless, and call_usermode_helper isn't nice
code. If someone in kernel-land could make call_usermode_helper safe in
interrupt context, at least link-state reporting would be fairly trivial. It
shouldn't be that hard - it's already using keventd. I just have no idea
about clone_thread and stuff like that.

Brad

- --
http://conf.linux.org.au. 22-25Jan2003. Perth, Aust. Tickets booked.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE9kOgDW6pHgIdAuOMRAqojAJ0aiXkHtK0eo6/1Bg+Yo8zSzBCMSQCfXK0x
5CSmDWhwRiamJwttaxF6Eac=
=hqKf
-----END PGP SIGNATURE-----

2002-09-24 20:50:27

by Tim Hockin

[permalink] [raw]
Subject: Re: alternate event logging proposal

Jeff Garzik wrote:
> In existing drivers that call netif_carrier_{on,off}, it is perhaps even
> possible to have them send netlink messages with no driver-specific code
> changes at all.

This is something that I have been asked to look at, here. Jeff, how
(or is?) any of the netlink info pushed up to userspace? The idea that
someone came to me with was to have something in (driverfs? netdevfs?)
that was poll()able and read()able. read() giving current state, and
poll() waking on changes. Or maybe two different files, but something.
Of course it'd be greate to be generic. I just assumed it would come
from netif_* for netdevices.

Is this something planned? wanted? something I should bang out into
2.5.x before end of next month?

We could have a generic device-events file (akin to acpi events) that a
daemon dispatches events into user-land, or we could have a kernel->user
callback a la /sbin/hotplug, or we could have many device/subsys
specific files.

Anyone have a preference?

Tim



--
Tim Hockin
Systems Software Engineer
Sun Microsystems, Linux Kernel Engineering
[email protected]

2002-09-25 14:38:33

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: alternate event logging proposal

On 2002-09-24T15:57:36,
Chris Friesen <[email protected]> said:

I have cut the Cc/To list severely. Couldn't stand it.

> >"What do you want to log?" is as important to me as "how do you want to
> >log it?" And the answers to the two questions are very much intertwined.
> Also related is "how can userspace be notified of kernel events?".
> There is no way for a userspace app to be notified that, for instance,
> an ATM device got a loss of signal. The drivers print it out, but the
> userspace app has no clue.

This is a very generic problem. For example, the Open Clustering Framework
also has a lots of events going back and forth - node membership changes,
distributed locks gained or lost, messages pending etc.

The interesting issue here is that the provider or consumer may be implemented
in kernel or user space depending on the task and the given implementation;
ie, what we would ideally need is a common coherent infrastructure for event
distribution on Linux, as I expect this to be a common task for many
scenarios.

Joe DiMartino of OSDL has developed a very nice Event API (I think) for
user-space (a fixed version with some updates we agreed upon in the last two
weeks is pending), and we are currently evaluating how to interface with the
kernel space.

It would be _very nice_ if someone would pick up the lead wrt to "Event
Processing", so that work is not duplicated between so many (already
overloaded) groups.

Just my few euro-cents.


Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
Principal Squirrel
Research and Development, SuSE Linux AG

``Immortality is an adequate definition of high availability for me.''
--- Gregory F. Pfister

2002-10-01 22:21:44

by Pavel Machek

[permalink] [raw]
Subject: Re: [evlog-dev] alternate event logging proposal

Hi!


> - Drivers should log initialization and uninitialization information for both
> drivers and devices. Knowning when a driver is loaded or unloaded is useful
> information.

[snip]

> - Beware of log pollution - log only relevant information. Avoid frequently
> recurring messages.

You should not report *driver* initialization but only hardware one,
or you are going to overrun dmesg very fast (log pollution).

Driver version is useless -- you already know what kernel you are
booting -- right?
Pavel
--
When do you have heart between your knees?