2003-05-24 06:11:29

by Jim Keniston

[permalink] [raw]
Subject: [RFC] [PATCH] Net device error logging (revised)

diff -Naur linux.org/include/linux/netdevice.h linux.netdev_printk.patched/include/linux/netdevice.h
--- linux.org/include/linux/netdevice.h Fri May 23 21:35:22 2003
+++ linux.netdev_printk.patched/include/linux/netdevice.h Fri May 23 21:35:22 2003
@@ -444,6 +444,9 @@
struct divert_blk *divert;
#endif /* CONFIG_NET_DIVERT */

+ /* NETIF_MSG_* flags to control the types of events we log */
+ int msg_enable;
+
/* generic object representation */
struct kobject kobj;
};
@@ -708,6 +711,8 @@
NETIF_MSG_PKTDATA = 0x1000,
NETIF_MSG_HW = 0x2000,
NETIF_MSG_WOL = 0x4000,
+ NETIF_MSG_ALL = -1, /* always log message */
+ NETIF_MSG_ = -1 /* always log message */
};

#define netif_msg_drv(p) ((p)->msg_enable & NETIF_MSG_DRV)
@@ -835,6 +840,35 @@
extern void dev_clear_fastroute(struct net_device *dev);
#endif

+/* debugging and troubleshooting/diagnostic helpers. */
+/**
+ * netdev_printk() - Log message with interface name, driver name, bus ID.
+ * @sevlevel: severity level -- e.g., KERN_INFO
+ * @netdev: net_device pointer
+ * @msglevel: a standard message-level flag with the NETIF_MSG_ prefix removed.
+ * Unless msglevel is NETIF_MSG_ALL, or omitted, log the message only if
+ * that flag is set in netdev->msg_enable.
+ * @format: as with printk
+ * @args: as with printk
+ */
+#define netdev_printk(sevlevel, netdev, msglevel, format, arg...) \
+ (void) ( (NETIF_MSG_##msglevel == NETIF_MSG_ALL \
+ || ((netdev)->msg_enable & NETIF_MSG_##msglevel)) && \
+ printk(sevlevel "%s: " format , (netdev)->name , ## arg) )
+
+#ifdef DEBUG
+#define netdev_dbg(netdev, msglevel, format, arg...) \
+ netdev_printk(KERN_DEBUG , netdev , msglevel , format , ## arg)
+#else
+#define netdev_dbg(netdev, msglevel, format, arg...) do {} while (0)
+#endif
+
+#define netdev_err(netdev, msglevel, format, arg...) \
+ netdev_printk(KERN_ERR , netdev , msglevel , format , ## arg)
+#define netdev_info(netdev, msglevel, format, arg...) \
+ netdev_printk(KERN_INFO , netdev , msglevel , format , ## arg)
+#define netdev_warn(netdev, msglevel, format, arg...) \
+ netdev_printk(KERN_WARNING , netdev , msglevel , format , ## arg)

#endif /* __KERNEL__ */


Attachments:
netdev_printk-2.5.69.patch (2.13 kB)

2003-05-24 07:53:52

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Net device error logging (revised)

On Fri, 23 May 2003 23:21:24 PDT, Jim Keniston said:

> - With Steve Hemminger's creation of the "net" device class a few days
> ago, the network device's interface name is now sufficient to find the
> information about the underlying device in sysfs (even without running
> ethtool). So these macros no longer log the device's driver name and
> bus ID.

Is something *else* logging the driver name and bus ID?

Just because an interface is called 'eth2' when the message is logged
doesn't mean it's still eth2 when you actually *read* the message.
And no, you *can't* rely on finding the "renaming bus-ID to ethN" message
in the logs - if the system is unstable the last bit of local logging may
go bye-bye if not synced, and messages about network hardware problems are
*very* prone to not making it to the syslog server (I wonder why? ;).

Been there, done that, it's not fun. Almost swapped out the wrong eth1
a *second* time before realizing what was really going on...


Attachments:
(No filename) (226.00 B)

2003-05-27 23:06:46

by Jim Keniston

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Net device error logging (revised)

[email protected] wrote:
>
> On Fri, 23 May 2003 23:21:24 PDT, Jim Keniston said:
>
> > - With Steve Hemminger's creation of the "net" device class a few days
> > ago, the network device's interface name is now sufficient to find the
> > information about the underlying device in sysfs (even without running
> > ethtool). So these macros no longer log the device's driver name and
> > bus ID.
>
> Is something *else* logging the driver name and bus ID?

Short answer: Not in Linux 2.5.70.

Long answer: Some folks in the LTC are designing an error-log analysis (ELA)
system that will have access to sysfs as well as the event stream coming out
of the kernel. Once the network interface is registered with sysfs -- in
v2.5.70, it's via netdev_register_sysfs, as called from register_netdevice --
you can find the net_device's info in sysfs based on the interface name.
The aforementioned ELA system could then annotate the event record with the
desired additional data out of sysfs (including driver name and bus ID).

Before the net_device is registered (at least), you'd presumably want to log
the driver name and bus ID. One obvious way to do this is to have the probe
function call dev_* macros instead of netdev_* until register_netdev runs.
Another way could be via netdev_*, if we made netdev_* smart enough to log
the driver name and bus ID if netdev->class_dev.class isn't set yet.

There's clearly a difference of opinion among various developers as to whether
logging the interface name alone is sufficient. Either way, I think it's a
win to have the net_device's pointer (as opposed to its name, if you're lucky)
handy when logging info about the net device; and to have the message format
live in one spot (netdevice.h) rather than all over drivers/net.

>
> Just because an interface is called 'eth2' when the message is logged
> doesn't mean it's still eth2 when you actually *read* the message.
> And no, you *can't* rely on finding the "renaming bus-ID to ethN" message
> in the logs - if the system is unstable the last bit of local logging may
> go bye-bye if not synced, and messages about network hardware problems are
> *very* prone to not making it to the syslog server (I wonder why? ;).
>
> Been there, done that, it's not fun. Almost swapped out the wrong eth1
> a *second* time before realizing what was really going on...
>

Was the name slippage due to the intervention of an administrative utility?
Just curious.

Thanks.
Jim Keniston
IBM Linux Technology Center

2003-05-28 01:14:55

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Net device error logging (revised)

On Tue, 27 May 2003 16:17:48 PDT, Jim Keniston said:

> Was the name slippage due to the intervention of an administrative utility?
> Just curious.

No, it was due to the *lack* of intervention. Depending where I am, my
laptop can have up to 4 ethernet interfaces - onboard, docking station,
wireless card, and the ethernet side of the Xircom ethernet/modem card.
And of course 2.4 and 2.5 kernels number them differntly, and if I'm
booting single-user /sbin/nameif doesn't get run yes, etc etc etc.

So if I see 'eth2' in a dmesg, I can usually be sure it's not the onboard NIC,
but it could be any one of the other 3 and require some work to disambiguate.
(Yes, 'nameif' does a *wonderful* job of usually keeping all 4 nailed down
to save my sanity - it's just that it seems to fail to do so at the times
it really matters. ;)


Attachments:
(No filename) (226.00 B)