2003-05-18 21:24:51

by Anton Blanchard

[permalink] [raw]
Subject: Naming devices


Hi,

I just spent 2 hours trying to make a machine boot. It had one bad disk
and one bad network card. Normally not a problem, but this thing had 40
cards in it so identifying the problem ones was not straight forward.

I was wondering why we dont have a consistent way of printing a device
location? If all drivers used the same thing, eg:

struct pci_dev *foo;
...
printf("%s: could not enable card\n", PCI_LOCATION(foo));

Which by default would print pci bus/devfn and an arch could override eg
on ppc64 it would also print a location code:

U1.6-P1-I2/E1 (90:0c.0)

This sounds like the domain of the event logging guys but I havent seen
anything from them in a while. The nice thing about this is that when we
get pci domains nothing needs to be changed in the driver, we just
update the PCI_LOCATION macro.

Also the tendency of network drivers to print "eth0: foo" during
initialisation is even more of a problem. If you get a bad card then you
could end up reusing the eth0 name for a subsequent device, making
pinpointing the problem card difficult. On top of that some drivers use
dev->name between calling alloc_netdev() and register_netdev() so that
you end up with error messages like "eth%d: failed".

Anton


2003-05-18 22:23:47

by Russell King

[permalink] [raw]
Subject: Re: Naming devices

On Mon, May 19, 2003 at 07:33:59AM +1000, Anton Blanchard wrote:
> I was wondering why we dont have a consistent way of printing a device
> location? If all drivers used the same thing, eg:

Isn't this what dev->bus_id in the device structure is supposed to be?
(which is supposed to be a unique bus ID on a particular bus type, in
the pci case, a PCI device.)

> Also the tendency of network drivers to print "eth0: foo" during
> initialisation is even more of a problem. If you get a bad card then you
> could end up reusing the eth0 name for a subsequent device, making
> pinpointing the problem card difficult. On top of that some drivers use
> dev->name between calling alloc_netdev() and register_netdev() so that
> you end up with error messages like "eth%d: failed".

Now that the point has been raised, it seems pretty obvious that
initialisation failures should report the BUS ID of the failing card,
not the logical name assigned by the system to that device which could
change. Once the card is up and running, using the logical name becomes
meaningful - it's the identifier which user space uses to reference the
device.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2003-05-19 01:14:06

by Daniel Stekloff

[permalink] [raw]
Subject: Re: Naming devices

On Sunday 18 May 2003 02:33 pm, Anton Blanchard wrote:
> Hi,
>
> I just spent 2 hours trying to make a machine boot. It had one bad disk
> and one bad network card. Normally not a problem, but this thing had 40
> cards in it so identifying the problem ones was not straight forward.
>
> I was wondering why we dont have a consistent way of printing a device
> location? If all drivers used the same thing, eg:
>
> struct pci_dev *foo;
> ...
> printf("%s: could not enable card\n", PCI_LOCATION(foo));
>
> Which by default would print pci bus/devfn and an arch could override eg
> on ppc64 it would also print a location code:
>
> U1.6-P1-I2/E1 (90:0c.0)
>
> This sounds like the domain of the event logging guys but I havent seen
> anything from them in a while. The nice thing about this is that when we
> get pci domains nothing needs to be changed in the driver, we just
> update the PCI_LOCATION macro.
>
> Also the tendency of network drivers to print "eth0: foo" during
> initialisation is even more of a problem. If you get a bad card then you
> could end up reusing the eth0 name for a subsequent device, making
> pinpointing the problem card difficult. On top of that some drivers use
> dev->name between calling alloc_netdev() and register_netdev() so that
> you end up with error messages like "eth%d: failed".


Hi Anton,

We have been working on device macros that add standard prefixes to printk
messages. The purpose of the prefix is to identify the device in the message
with a specific device or sysfs directory. Generic device macros already are
in the 2.5 kernel in include/linux/device.h - dev_err, dev_info, etc. They
prefix printk messages with dev->bus_id and driver name.

Just last week or so, Jim Keniston asked for comments on network device
specific macros - netdev_printk. I thought these were handy when I was
working on a system with 4 ethernet cards. With the e1000 patch, I could
identify the device without having to use ethtool because netdev_printk
appends the PCI device id in the prefix of the message. I could tell which
device eth0 referred to from the message.

One of the reasons why we decided on the wrapper macros is the ability to
change the prefix in the future without impacting device drivers that have
implemented those macros. We could add more infromation from the device
structure to the message without requiring device drivers to change anything.
We could also use those macros as a hook to provide more functionality, like
building templates based on calling function and format string to idenify the
message uniquely, without impacting the driver.

Yet the macros we've been supplying are a bit rigid. Perhaps we should have
something like you've suggested that could be used by driver writers to tag a
message with a specific device location while not requiring the use of a
whole wrapper macro. Plus, you could override the result based on arch. You
wouldn't get the benefits of the current device macros, but you would be able
to identify the message with a specific device.


Thanks,

Dan





> Anton
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2003-05-19 01:35:09

by David Miller

[permalink] [raw]
Subject: Re: Naming devices

On Sun, 2003-05-18 at 18:22, Daniel Stekloff wrote:
> Just last week or so, Jim Keniston asked for comments on network device
> specific macros - netdev_printk. I thought these were handy when I was
> working on a system with 4 ethernet cards.

I don't understand how this is useful for this application.
If I put 1,000 e1000 cards into the machine, all the messages
scroll out of the dmesg buffer.

The only reliable source for this kind of information is ethtool.
The kernel message buffer is like IP datagram delivery in that it is
unreliable, whereas ethtool provides a stable source for this
information.

All I hear is that "hey we're making printk provide the same
information as ethtool", and when duplicating functionality you
ought to have a real good reason for it :-)

--
David S. Miller <[email protected]>

2003-05-19 03:30:44

by Anton Blanchard

[permalink] [raw]
Subject: Re: Naming devices


> Isn't this what dev->bus_id in the device structure is supposed to be?
> (which is supposed to be a unique bus ID on a particular bus type, in
> the pci case, a PCI device.)

We could use that, although for ppc64 Id like to increase its size and
stash the physical location in there as well.

> Now that the point has been raised, it seems pretty obvious that
> initialisation failures should report the BUS ID of the failing card,
> not the logical name assigned by the system to that device which could
> change. Once the card is up and running, using the logical name becomes
> meaningful - it's the identifier which user space uses to reference the
> device.

Sounds good to me.

Anton