2003-09-15 23:12:12

by Jim Keniston

[permalink] [raw]
Subject: [PATCH] Net device error logging, revised

diff -Naur linux.org/include/linux/netdevice.h linux.netdev.patched/include/linux/netdevice.h
--- linux.org/include/linux/netdevice.h Mon Sep 15 15:58:06 2003
+++ linux.netdev.patched/include/linux/netdevice.h Mon Sep 15 15:58:06 2003
@@ -467,6 +467,9 @@
struct divert_blk *divert;
#endif /* CONFIG_NET_DIVERT */

+ /* NETIF_MSG_* flags to control the types of events we log */
+ int msg_enable;
+
/* class/net/name entry */
struct class_device class_dev;
};
@@ -741,6 +744,7 @@
NETIF_MSG_PKTDATA = 0x1000,
NETIF_MSG_HW = 0x2000,
NETIF_MSG_WOL = 0x4000,
+ NETIF_MSG_ALL = -1, /* always log message */
};

#define netif_msg_drv(p) ((p)->msg_enable & NETIF_MSG_DRV)
@@ -899,6 +903,40 @@
extern void dev_clear_fastroute(struct net_device *dev);
#endif

+/* debugging and troubleshooting/diagnostic helpers. */
+/**
+ * netdev_printk() - Log message with interface name, gated by message level
+ * @sevlevel: severity level -- e.g., KERN_INFO
+ * @netdev: net_device pointer
+ * @msglevel: a standard message-level flag with the NETIF_MSG_ prefix removed.
+ * Unless msglevel is NETIF_MSG_ALL, log the message only if that flag
+ * is set in netdev->msg_enable.
+ * @format: as with printk
+ * @args: as with printk
+ */
+extern int __netdev_printk(const char *sevlevel,
+ const struct net_device *netdev, int msglevel, const char *format, ...);
+#define netdev_printk(sevlevel, netdev, msglevel, format, arg...) \
+ __netdev_printk(sevlevel , netdev , NETIF_MSG_##msglevel , \
+ format , ## arg)
+
+#ifdef DEBUG
+#define netdev_dbg(netdev, msglevel, format, arg...) \
+ netdev_printk(KERN_DEBUG , netdev , msglevel , format , ## arg)
+#else
+#define netdev_dbg(netdev, msglevel, format, arg...) do {} while (0)
+#endif
+
+#define netdev_err(netdev, msglevel, format, arg...) \
+ netdev_printk(KERN_ERR , netdev , msglevel , format , ## arg)
+#define netdev_info(netdev, msglevel, format, arg...) \
+ netdev_printk(KERN_INFO , netdev , msglevel , format , ## arg)
+#define netdev_warn(netdev, msglevel, format, arg...) \
+ netdev_printk(KERN_WARNING , netdev , msglevel , format , ## arg)
+
+/* report fatal error unconditionally; msglevel ignored for now */
+#define netdev_fatal(netdev, msglevel, format, arg...) \
+ netdev_printk(KERN_ERR , netdev , ALL , format , ## arg)

#endif /* __KERNEL__ */

diff -Naur linux.org/net/core/dev.c linux.netdev.patched/net/core/dev.c
--- linux.org/net/core/dev.c Mon Sep 15 15:58:07 2003
+++ linux.netdev.patched/net/core/dev.c Mon Sep 15 15:58:07 2003
@@ -3035,3 +3035,75 @@
}

subsys_initcall(net_dev_init);
+
+static spinlock_t netdev_printk_lock = SPIN_LOCK_UNLOCKED;
+/**
+ * __netdev_printk() - Log message with interface name, gated by message level
+ * @sevlevel: severity level -- e.g., KERN_INFO
+ * @netdev: net_device pointer
+ * @msglevel: a standard message-level flag such as NETIF_MSG_PROBE.
+ * Unless msglevel is NETIF_MSG_ALL, log the message only if
+ * that flag is set in netdev->msg_enable.
+ * @format: as with printk
+ * @args: as with printk
+ *
+ * Does the work for the netdev_printk macro.
+ * For a lot of network drivers, the probe function looks like
+ * ...
+ * netdev = alloc_netdev(...); // or alloc_etherdev(...)
+ * SET_NETDEV_DEV(netdev, dev);
+ * ...
+ * register_netdev(netdev);
+ * ...
+ * netdev_printk and its wrappers (e.g., netdev_err) can be used as
+ * soon as you have a valid net_device pointer -- e.g., from alloc_netdev,
+ * alloc_etherdev, or init_etherdev. (Before that, use dev_printk and
+ * its wrappers to report device errors.) It's common for an interface to
+ * have a name like "eth%d" until the device is successfully configured,
+ * and the call to register_netdev changes it to a "real" name like "eth0".
+ *
+ * If the interface's reg_state is NETREG_REGISTERED, we assume that it has
+ * been successfully set up in sysfs, and we prepend only the interface name
+ * to the message -- e.g., "eth0: NIC Link is Down". The interface
+ * name can be used to find eth0's driver, bus ID, etc. in sysfs.
+ *
+ * For any other value of reg_state, we prepend the driver name and bus ID
+ * as well as the (possibly incomplete) interface name -- e.g.,
+ * "eth%d (e100 0000:00:03.0): Failed to map PCI address..."
+ *
+ * Probe functions that alloc and register in one step (via init_etherdev),
+ * or otherwise register the device before the probe completes successfully,
+ * may need to take other steps to ensure that the failing device is clearly
+ * identified.
+ */
+int __netdev_printk(const char *sevlevel, const struct net_device *netdev,
+ int msglevel, const char *format, ...)
+{
+ if (!netdev || !format) {
+ return -EINVAL;
+ }
+ if (msglevel == NETIF_MSG_ALL || (netdev->msg_enable & msglevel)) {
+ static char msg[512]; /* protected by netdev_printk_lock */
+ unsigned long flags;
+ va_list args;
+ struct device *dev = netdev->class_dev.dev;
+
+ spin_lock_irqsave(&netdev_printk_lock, flags);
+ va_start(args, format);
+ vsnprintf(msg, 512, format, args);
+ va_end(args);
+
+ if (!sevlevel) {
+ sevlevel = "";
+ }
+
+ if (netdev->reg_state == NETREG_REGISTERED || !dev) {
+ printk("%s%s: %s", sevlevel, netdev->name, msg);
+ } else {
+ printk("%s%s (%s %s): %s", sevlevel, netdev->name,
+ dev->driver->name, dev->bus_id, msg);
+ }
+ spin_unlock_irqrestore(&netdev_printk_lock, flags);
+ }
+ return 0;
+}
diff -Naur linux.org/net/netsyms.c linux.netdev.patched/net/netsyms.c
--- linux.org/net/netsyms.c Mon Sep 15 15:58:07 2003
+++ linux.netdev.patched/net/netsyms.c Mon Sep 15 15:58:07 2003
@@ -210,6 +210,7 @@
EXPORT_SYMBOL(net_ratelimit);
EXPORT_SYMBOL(net_random);
EXPORT_SYMBOL(net_srandom);
+EXPORT_SYMBOL(__netdev_printk);

/* Needed by smbfs.o */
EXPORT_SYMBOL(__scm_destroy);


Attachments:
netdev-2.6.0-test5.patch (5.63 kB)

2003-09-17 02:06:10

by Feldman, Scott

[permalink] [raw]
Subject: RE: [PATCH] Net device error logging, revised

> 3. A new macro, netdev_fatal, is included. Given the call
> netdev_fatal(dev, HW, "NIC fried!\n");
> the indicated message is always logged: the msglevel arg (HW, in this
> case) is NOT consulted. In fact, the msglevel arg to netdev_fatal
> is ignored in this implementation. (As previously discussed, in some
> future implementation, the msglevel could be logged to help indicate
> the circumstances under which the event was logged.)

I couldn't find the previous discussion on netdev_fatal, so sorry if
this has already been worked out. It uses KERN_ERR; did you mean
something stronger? If not, why not just use netdev_err(dev, ALL,
"...")? What is the situation in the driver where we'd want to use
_fatal? How do I know when to use _fatal and when to use _err?

-scott

2003-09-17 17:59:43

by Jim Keniston

[permalink] [raw]
Subject: Re: [PATCH] Net device error logging, revised

"Feldman, Scott" wrote:
>
> > 3. A new macro, netdev_fatal, is included. Given the call
> > netdev_fatal(dev, HW, "NIC fried!\n");
> > the indicated message is always logged: the msglevel arg (HW, in this
> > case) is NOT consulted. In fact, the msglevel arg to netdev_fatal
> > is ignored in this implementation. (As previously discussed, in some
> > future implementation, the msglevel could be logged to help indicate
> > the circumstances under which the event was logged.)
>
> I couldn't find the previous discussion on netdev_fatal, so sorry if
> this has already been worked out. It uses KERN_ERR; did you mean
> something stronger? If not, why not just use netdev_err(dev, ALL,
> "...")? What is the situation in the driver where we'd want to use
> _fatal? How do I know when to use _fatal and when to use _err?
>
> -scott

Good question. There was a discussion thread where somebody pared down
the cc list (no LKML, no netdev, an apparently no Scott). I saved most
of the messages, and can send them along to you if you want.

Anyway, in support of passing the NETIF_MSG_* msglevel as an arg to the
netdev_* macros, I said that it...
> also opens the door for logging more clearly what part of the driver (PROBE,
> TX, RX, etc.) the message comes from.

David Brownell replied:
> This is a different issue. Why wouldn't it be enough
> to have unique messages?

I replied:
> First, I don't want acceptance of the netdev_* idea to hinge on this
> minor issue. Everything below is blue(r) sky...
>
> That said, it could be useful for an error-analysis utility to know that
> the message is (say) a WARNING categorized as an RX_ERR, especially if the
> utility hasn't been configured to recognized that specific message string.
> That's probably info that wouldn't come to you via printk (e.g., since
> syslog discards the severity level), but could come via another logging
> or monitoring system that plugs in (along with printk) via netdev_*.
>
> One problem with this idea is that, as I read Becker's document
> (URL above), "fatal" errors should always be logged, and therefore
> not tagged as PROBE, TX_ERR, etc. If we wanted to pursue this idea,
> we could implement
> netdev_fatal(dev, PROBE, "...")
> which logs the message unconditionally (KERN_ERR) and categorizes it as
> a PROBE error.

Jeff Garzik replied:
> printk'ing probe errors is definitely something I want to encourage.
> Probe errors are the most visible indication of what went wrong, if the
> driver fails to load. These are the messages that help developers
> figure out the problem. So I like the direction you suggest, in "If we
> wanted to pursue[...]"

In summary, the idea for netdev_fatal() is based on the idea that the msglevel
arg can not only be used for verbosity control, but can be logged as yet
another useful piece of info. For printk-based logging, it could easily be
added to the message prefix:
eth2: PROBE: Invalid Ethernet address
If you log this message using
netdev_err(dev, PROBE, "Invalid Ethernet address\n")
it will be logged only if the PROBE flag is set in the msg_enable bitmap.
If you log it using
netdev_fatal(dev, PROBE, "Invalid Ethernet address\n")
it will always be logged.

As mentioned above, I don't think the netdev_fatal idea is central to the
whole netdev_printk idea.

By the way, regarding using KERN_ERR vs. something stronger: It's pretty
unusual for a driver to log anything as KERN_CRIT or worse -- presumably
due to a desire not to overreact if the nth of n redundant devices happens
to fail. So unless we can agree on a definition of KERN_CRIT (e.g., it
applies to any failure that makes a device unavailable pending human
intervention), I'd prefer to steer clear of that.

Jim