2009-03-24 15:56:22

by Matt Domsch

[permalink] [raw]
Subject: Network Device Naming mechanism and policy

You may recall http://lkml.org/lkml/2006/9/29/268, wherein I described
network device enumeration and naming challenges, and several possible
fixes. Of these, Fix #1 (fix the PCI device list to be sorted
breadth-first) has been implemented in the kernel, and Fix #3 (system
board routing rules) have been implemented on Dell PowerEdge 10G and
11G servers (11G begin selling RSN).

However, these have not been completely satisfactory. In particular,
it keeps getting harder and harder to route PCI-Express lanes to
guarantee the same ordering between a depth-first and breadth-first
walk, and it turns out, that isn't sufficient anyhow.


Problem: Users expect on-motherboard NICs to be named eth0..ethN. This can be difficult to achieve.

Ethernet device names are initially assigned by the kernel, and may be
changed by udev or nameif in userspace. The initial name assigned by
the kernel is in monotonically increasing order, starting with eth0.
In this instance, the enumeration directly leads to an assigned name.

Complications:

1) Devices are discovered, and presented to the kernel for name
assignment, based on several factors:

a) the kernel hotplug mechanism emits events for udev to catch, to
load the appropriate driver for a given device. The kernel
emits these events in some ordering, tied to the depth-first PCI
bus walk. Therefore the order in which userspace catches these
events and starts to load a given device driver is tied to the
depth-first bus walk. There is no guarantee within PCI-Express
hardware topology of any ordering to the discovery of devices.

To ease this complication, SMBIOS 2.6 includes a mechanism for
BIOS to specify its expected ordering of devices, for naming
purposes. Tools such as biosdevname use this information.


b) udev may run modprobes in parallel. It guarantees that the
events and modprobes are begun in order, but makes no guarantee
that one event's modprobe completes before beginning a second
modprobe. This leads to naming races in the kernel, as drivers
begun in parallel, which discover their own devices, present
them to the kernel for name assignment. In this scenario, if
you have multiple device drivers for multiple NIC types (say,
bnx2 and e1000) in the same system, the kernel's naming of the
ports is non-deterministic. On one boot you may have two e1000
ports as eth0 and eth1, then a bnx2 port as eth2, then another
e1000 port as eth3; on a subsequent boot, you may have the ports
assigned other names. The ports are assigned names "in order"
if you only look within a single device driver, but may be "out
of order" if you look across all the drivers.

To get any consistent ordering now, one of two things must
happen:

i) drivers must be loaded before udev begins loading drivers
(either very early in initscripts, or in the inital ramdisk).
ii) something must "fix up" the kernel-assigned names after
udev's modprobes complete. udev does this as well.

2) udev may have rules to change the device names. This is most often
seen in the '70-persistent-net.rules' file. Here we have
additional challenges:

a) this does not exist the first time devices are discovered; the
naming may be incorrect during first discovery, leading to the
names being permanently incorrect (unless this file is edited).

b) it introduces state (MAC addresses) to the system, on a system
that would otherwise not need state. This complicates
image-based deployments, Live Media-based deployments, and other
stateless deployments.

c) udev may not always be able to change a device's name. If udev
uses the kernel assignment namespace (ethN), then a rename of
eth0->eth1 may require renaming eth1->eth0 (or something else).
Udev operates on a single device instance at a time, it becomes
difficult to switch names around for multiple devices, within
the single namespace.


3) End users have the (reasonable?) expectation that NIC ports
embedded on the system are named eth0..ethN (Dell sells servers
with 4 NICs onboard), and that add-in NICs get assigned names
ethN+1..., ideally in physical PCI slot order. Which after
install, using udev to set up rules, we can accomplish (again using
the SMBIOS 2.6 information), but with the complications noted
above.

4) When adding a network card to an existing system, what should the
ports on the new card be named? If it is added, they will be named
ethN+1... above the existing named cards. This means a (new)
add-in card in PCI slot 3 may have ports named eth5 and eth6, while
an add-in card in PCI slot 5 may have ports named eth2 and eth3.
This is not intuitive.

This really doesn't address the notion of names matching some
physical attribute. If you look at a network switch, the naming of
the ports both in management software and on chassis labels is
based on physical location, e.g. slot 4, port 2. For add-in PCI
cards, being able to match a logical device name to a physical port
names is important. The ethtool -p (flash the port's LEDs) trick
works alright, but still requires a good bit of human interaction
to know which port is a given ethN number (at the moment...).

Nor does it address the desire to name devices based on their usage
(e.g. name the ports public, dmz, private, management, backup,
storage).


I'd like to see a distinction made between kernel-assigned names, and
user-visible names, for network devices. We already see this
distinction with non-network devices, in that /dev/sda is "some disk",
yet /dev/disk/by-label/mybootdisk is a symlink to /dev/sda. Tools
that care about the human-interesting names use the /dev/disk/by-label
name. Udev takes care of the symlinks. Network devices do not have
such a method for providing alternative names for a single device,
that I am aware of.


In my ideal world, I would like to see users expectations of network
device naming changed (much as we did in the ide -> libata transition,
where disks went from being named /dev/hda to /dev/sda, with all the
complications that entailed). I'd like for the names a sysadmin uses
to be physical-based, with on-board NICs named accordingly, and add-in
NICs named for the PCI slot they occupy. (I'll set aside non-PCI
add-ins, such as USB, for a bit...)

biosdevname (http://linux.dell.com/projects.shtml#biosdevname) takes a
stab at this. It can be integrated into udev, such that the
70-persistent-net.rules file is never used, and the naming for each
device comes from several different policies. Its primary drawback is
that it changes the device namespace, which some sysadmins, and tools,
may not like. Names for devices become eth_s0_0 for the first
onboard NIC, eth_s0_1 for the second; eth_s3_3 for the fourth port
on PCI Slot #3, etc.

If we wish to avoid changing the namespace, (i.e. to keep using ethN),
then we need some method to "fix up" the ethN namespace to be
"correct".


Some options:

Option 0: do nothing different. Don't use biosdevname. Keep udev
as-is. Users continue to have to figure out, for each system type and
potentially for each boot, which NIC is connected to which name. This
has been the #1 customer complaint about Linux on Dell servers for
several years. I'd prefer not to keep it this way.

Option 1: use udev + biosdevname, and change the device namespace,
from ethN to eth_sX_Y, or similar. This solves the problem cleanly,
but changes the names users presently expect.

Option 2: Add alternative names for network devices in some fashion.
The kernel would then assign both the kernel-name (say, en0), and the
initial alternative name (say, eth0), but userspace could then adjust
the alternative name as it sees fit based on naming policy (physical
location, usage, etc.). Bonus points for allowing multiple
alternative names for a single device, so you can have both
physical-based names and usage-based names, for a single device (as we
do for /dev/disk/by-*).

Option 3: INSERT YOUR IDEA HERE


I'm looking for these or additional options for how to solve this,
once and for all.

Thanks,
Matt

--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & http://www.dell.com/linux


2009-03-24 16:21:58

by Patrick McHardy

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

Matt Domsch wrote:
> 2) udev may have rules to change the device names. This is most often
> seen in the '70-persistent-net.rules' file. Here we have
> additional challenges:
>
> ...
>
> c) udev may not always be able to change a device's name. If udev
> uses the kernel assignment namespace (ethN), then a rename of
> eth0->eth1 may require renaming eth1->eth0 (or something else).
> Udev operates on a single device instance at a time, it becomes
> difficult to switch names around for multiple devices, within
> the single namespace.

I would classify this as a bug, especially the fact that udev doesn't
undo a failed rename, so you end up with ethX_rename. Virtual devices
using the same MAC address trigger this reliably unless you add
exceptions to the udev rules.

You state that it only operates on one device at a time. If that is
correct, I'm not sure why the _rename suffix is used at all instead
of simply trying to assign the final name, which would avoid this
problem.

2009-03-24 16:29:50

by Kay Sievers

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, Mar 24, 2009 at 17:21, Patrick McHardy <[email protected]> wrote:
> Matt Domsch wrote:
>>
>> 2) udev may have rules to change the device names.  This is most often
>>   seen in the '70-persistent-net.rules' file.  Here we have
>>   additional challenges:
>>
>> ...
>>
>>   c) udev may not always be able to change a device's name.  If udev
>>      uses the kernel assignment namespace (ethN), then a rename of
>>      eth0->eth1 may require renaming eth1->eth0 (or something else).
>>      Udev operates on a single device instance at a time, it becomes
>>      difficult to switch names around for multiple devices, within
>>      the single namespace.
>
> I would classify this as a bug, especially the fact that udev doesn't
> undo a failed rename, so you end up with ethX_rename. Virtual devices
> using the same MAC address trigger this reliably unless you add
> exceptions to the udev rules.

This is handled in most cases. Virtual interfaces claiming a
configured name and created before the "hardware" interface are not
handled, that's right, but pretty uncommon.

> You state that it only operates on one device at a time. If that is
> correct, I'm not sure why the _rename suffix is used at all instead
> of simply trying to assign the final name, which would avoid this
> problem.

How? The kernel assignes the names and the configured names may
conflict. So you possibly can not rename a device to the target name
when it's name is already taken. I don't see how to avoid this.

Thanks,
Kay

2009-03-24 16:39:39

by Patrick McHardy

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

Kay Sievers wrote:
> On Tue, Mar 24, 2009 at 17:21, Patrick McHardy <[email protected]> wrote:
>> Matt Domsch wrote:
>>> c) udev may not always be able to change a device's name. If udev
>>> uses the kernel assignment namespace (ethN), then a rename of
>>> eth0->eth1 may require renaming eth1->eth0 (or something else).
>>> Udev operates on a single device instance at a time, it becomes
>>> difficult to switch names around for multiple devices, within
>>> the single namespace.
>> I would classify this as a bug, especially the fact that udev doesn't
>> undo a failed rename, so you end up with ethX_rename. Virtual devices
>> using the same MAC address trigger this reliably unless you add
>> exceptions to the udev rules.
>
> This is handled in most cases. Virtual interfaces claiming a
> configured name and created before the "hardware" interface are not
> handled, that's right, but pretty uncommon.

I don't remember the exact circumstances, but I've seen it quite a few
times. I'll gather some information next time.

>> You state that it only operates on one device at a time. If that is
>> correct, I'm not sure why the _rename suffix is used at all instead
>> of simply trying to assign the final name, which would avoid this
>> problem.
>
> How? The kernel assignes the names and the configured names may
> conflict. So you possibly can not rename a device to the target name
> when it's name is already taken. I don't see how to avoid this.

Sure, you can't rename it when the name is taken. But what udev
apparently does when renaming a device is:

- rename eth0 to eth0_rename
- rename eth0_rename to eth2
- rename returns -EEXISTS: udev keeps eth0_rename

What it could do is:

- rename eth0 to eth2
- rename returns -EEXISTS: device at least still has a proper name

Alternatively it should unroll the rename and hope that the
old name is still free. But I don't see why the _rename step
would do any good, assuming only a single device is handled at
a time, it can't prevent clashes.

2009-03-24 16:43:43

by Dan Williams

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, 2009-03-24 at 17:21 +0100, Patrick McHardy wrote:
> Matt Domsch wrote:
> > 2) udev may have rules to change the device names. This is most often
> > seen in the '70-persistent-net.rules' file. Here we have
> > additional challenges:
> >
> > ...
> >
> > c) udev may not always be able to change a device's name. If udev
> > uses the kernel assignment namespace (ethN), then a rename of
> > eth0->eth1 may require renaming eth1->eth0 (or something else).
> > Udev operates on a single device instance at a time, it becomes
> > difficult to switch names around for multiple devices, within
> > the single namespace.
>
> I would classify this as a bug, especially the fact that udev doesn't
> undo a failed rename, so you end up with ethX_rename. Virtual devices
> using the same MAC address trigger this reliably unless you add
> exceptions to the udev rules.

Any particular reason the MAC addresses are the same? This came up a
while ago with the 'dnet' device in the thread "Dave DNET ethernet
controller".

If the MAC address isn't a UUID for the device, then *what* is?

If there isn't one, then certainly udev can't be blamed for getting
ordering or names wrong, because there's nothing to use to actually
match up the device to a name, uniquely. Note that combinations
including bus IDs or device positions in the bus don't work for any type
of hotplug case, because you can plug another adapter into the same
location but it's a different adapter.

Either people want (a) a name assigned to a specific device (which
implies a UUID like a MAC address stored on that device somewhere
accessible to the driver at plug/boot time), or they want (b) to assign
a name to a *position* on the PCI or USB or firewire or whatever bus, or
they (c) don't care about this at all.

The answer is really 'all of the above'. Most of the people Matt cares
about are probably in the (b) camp. But most desktop/laptop users are
in the (a) camp because they use hotplug so much.

Dan

> You state that it only operates on one device at a time. If that is
> correct, I'm not sure why the _rename suffix is used at all instead
> of simply trying to assign the final name, which would avoid this
> problem.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-03-24 17:01:17

by Alan

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

> If the MAC address isn't a UUID for the device, then *what* is?

MAC is technically per system if desired (eg old Sun boxes) and that is
quite valid by IEE802.3. In that case you need MAC + topology.

If you are running DECnet your system runs on assigned MAC addresses so
you also have to be careful to use the EPROM MAC (if one exists which is
99.9% of the time) + topology.

> Either people want (a) a name assigned to a specific device (which
> implies a UUID like a MAC address stored on that device somewhere
> accessible to the driver at plug/boot time), or they want (b) to assign
> a name to a *position* on the PCI or USB or firewire or whatever bus, or
> they (c) don't care about this at all.

I'd argue the fumdamental problem is that I can do this

ln -s /dev/sda /dev/thebigdiskunderthefridge

but cannot ln -s /dev/eth0 /dev/ethernet/slot0

and the SIOCGIF/SIF BSD style ioctl interface doesn't do pathnames or
file handles of network devices.

Anyone feel up to putting all the network devices into dev space and
fixing the ioctls ;)

2009-03-24 17:02:47

by Scott James Remnant

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, 2009-03-24 at 10:46 -0500, Matt Domsch wrote:

> b) udev may run modprobes in parallel. It guarantees that the
> events and modprobes are begun in order, but makes no guarantee
> that one event's modprobe completes before beginning a second
> modprobe. This leads to naming races in the kernel, as drivers
> begun in parallel, which discover their own devices, present
> them to the kernel for name assignment.
>
Also bear in mind that a module completing init() does not necessarily
mean that the interfaces have been created. If the driver requires
firmware, it will call out to userspace, and may not register the
interface until well afterwards.

One could even construct a pathological case where only a virtual device
was registered, and userspace was required to add logical interfaces
(most likely in a udev rule).

> 2) udev may have rules to change the device names. This is most often
> seen in the '70-persistent-net.rules' file. Here we have
> additional challenges:
>
> a) this does not exist the first time devices are discovered; the
> naming may be incorrect during first discovery, leading to the
> names being permanently incorrect (unless this file is edited).
>
Well, the obvious fix to this is to make sure the names are always
correct :)

> c) udev may not always be able to change a device's name. If udev
> uses the kernel assignment namespace (ethN), then a rename of
> eth0->eth1 may require renaming eth1->eth0 (or something else).
> Udev operates on a single device instance at a time, it becomes
> difficult to switch names around for multiple devices, within
> the single namespace.
>
Actually udev handles this by using a temporary name. When renaming
eth0->eth1 it actually uses an intermediate name first. This allows it
to simultaneously swap eth0<->eth1 since one unblocks the other
(actually both unblock each other).

There is a failure case where two devices both end up trying to get the
same name, in which case one will lock with a "_rename" name. There was
an early debate in Ubuntu when we first wrote this code about using
later names (eth2, eth3, etc.) but we realised that just hides the
problem (and it happens again if you plug in a pccard or something that
wants eth2).

Since this is always a bug, making the problem visible was a "good
thing".

> biosdevname (http://linux.dell.com/projects.shtml#biosdevname) takes a
> stab at this. It can be integrated into udev, such that the
> 70-persistent-net.rules file is never used, and the naming for each
> device comes from several different policies. Its primary drawback is
> that it changes the device namespace, which some sysadmins, and tools,
> may not like. Names for devices become eth_s0_0 for the first
> onboard NIC, eth_s0_1 for the second; eth_s3_3 for the fourth port
> on PCI Slot #3, etc.
>
While this works for PCI slots, it already doesn't scale to other buses.
For example what slot number is the pccard slot? If you have two
different pccard devices, would they get assigned the same name (udev
currently assigns them different names).

Now consider USB. Would the device name change depending on which USB
port you plugged it into? Or is USB just a single slot, in which case
what happens when you have two USB ethernet devices?

The Apple USB Ethernet device in my iPhone is not the USB Wireless
adapter I own, both have very different networking configurations.

it's not ideal in the laptop world. Consider a user with two different

> Option 3: INSERT YOUR IDEA HERE
>
I quite liked the idea of /dev/eth0, then we could just use symlinks.

Scott
--
Scott James Remnant
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-03-24 17:04:53

by Patrick McHardy

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

Dan Williams wrote:
> On Tue, 2009-03-24 at 17:21 +0100, Patrick McHardy wrote:
>> I would classify this as a bug, especially the fact that udev doesn't
>> undo a failed rename, so you end up with ethX_rename. Virtual devices
>> using the same MAC address trigger this reliably unless you add
>> exceptions to the udev rules.
>
> Any particular reason the MAC addresses are the same? This came up a
> while ago with the 'dnet' device in the thread "Dave DNET ethernet
> controller".
>
> If the MAC address isn't a UUID for the device, then *what* is?

Sometimes (I was referring to virtual devices) there may not be
one, thats correct.

> If there isn't one, then certainly udev can't be blamed for getting
> ordering or names wrong, because there's nothing to use to actually
> match up the device to a name, uniquely.

I agree that udev can't do anything useful in that case. I would
prefer it it wouldn't even try though instead of messing with the
names and leaving a bunch of _rename devices around. Sure, I can
add a rule to disable it, but that shouldn't be necessary.

Generally, I'm wondering whether it should touch virtual network
devices at all since the MAC addresses are often not persistent,
sometimes not unique and the name might have already been chosen
explicitly by the administrator when creating the device.

Currently there are some rules to ignore a couple of known virtual
devices types. Are there actually cases where renaming virtual
devices is desired? Otherwise a more future-proof way than
blacklisting each type individually would be to add some attribute
informing udev that the device has no unique key and should be
ignored.

2009-03-24 17:14:52

by Karl O. Pinc

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

My thoughts on the subject; from someone who is not
particularly qualified to have opinions.

Reading over your post, I searched for a single sentence describing
the problem you're trying to solve. What I came up with was
this:

On 03/24/2009 10:46:17 AM, Matt Domsch wrote:

> Users continue to have to figure out, for each system type
> and
> potentially for each boot, which NIC is connected to which name. This
> has been the #1 customer complaint about Linux on Dell servers for
> several years. I'd prefer not to keep it this way.

Perhaps a little magic in the udev rule that creates the
z70_persistent-net-rules file would solve the basic problem.
It could sort the nics by mac address when creating the
names. It need only run when the z70 file does not exist.
I presume this would produce consistent results in most cases
and it feels technically feasible; although I am not
fully qualified to make that judgment.

Rather that put the onus on udev to make the above
change Dell could just run a little program at first
boot that mungs the z70 file as desired. (It could then
force a reboot; I forget if this would be needed.)
I imagine Dell boots the boxes once at the factory,
but if not then the user has to suffer with a longer
boot process at first boot. Because this is driven
by Dell, Dell would know exactly what nic has what
name. And Dell knows what nics are on the mobo and
what are not, and so can control the mac address sort
order as desired.

The other solution that screams out at me is to ditch
those legacy BIOSes and go to something like LinuxBIOS.
Again, I'm not really qualified, but it sure feels like
there's an answer in this approach.

The other point that struck me was that sometimes, it seems,
users want persistence in the naming of their network devices
and sometimes they want device names based on bus position.

The sucky thing is that symlinks and nics don't mix well
and so it seems impossible to satisfy both the above
requirements at the same time. This is an area that
IMHO could be better addressed by the Linux community.

Karl <[email protected]>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

2009-03-24 17:53:38

by Matt Domsch

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, Mar 24, 2009 at 05:02:19PM +0000, Scott James Remnant wrote:
> On Tue, 2009-03-24 at 10:46 -0500, Matt Domsch wrote:
> > biosdevname (http://linux.dell.com/projects.shtml#biosdevname) takes a
> > stab at this. It can be integrated into udev, such that the
> > 70-persistent-net.rules file is never used, and the naming for each
> > device comes from several different policies. Its primary drawback is
> > that it changes the device namespace, which some sysadmins, and tools,
> > may not like. Names for devices become eth_s0_0 for the first
> > onboard NIC, eth_s0_1 for the second; eth_s3_3 for the fourth port
> > on PCI Slot #3, etc.
> >
> While this works for PCI slots, it already doesn't scale to other buses.
> For example what slot number is the pccard slot? If you have two
> different pccard devices, would they get assigned the same name (udev
> currently assigns them different names).

actually biosdevname handles this already, using eth_pccard_X.Y where
X = socket and Y = function.

> Now consider USB. Would the device name change depending on which USB
> port you plugged it into? Or is USB just a single slot, in which case
> what happens when you have two USB ethernet devices?
>
> The Apple USB Ethernet device in my iPhone is not the USB Wireless
> adapter I own, both have very different networking configurations.

we would obviously need a solution. eth_usb_{something} perhaps.

--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & http://www.dell.com/linux

2009-03-24 17:57:56

by Matt Domsch

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, Mar 24, 2009 at 11:42:57AM -0500, Karl O. Pinc wrote:
> My thoughts on the subject; from someone who is not
> particularly qualified to have opinions.
>
> Reading over your post, I searched for a single sentence describing
> the problem you're trying to solve. What I came up with was
> this:
>
> On 03/24/2009 10:46:17 AM, Matt Domsch wrote:
>
> > Users continue to have to figure out, for each system type
> >and
> >potentially for each boot, which NIC is connected to which name. This
> >has been the #1 customer complaint about Linux on Dell servers for
> >several years. I'd prefer not to keep it this way.

yeah, that's pretty much it.

> Perhaps a little magic in the udev rule that creates the
> z70_persistent-net-rules file would solve the basic problem.
> It could sort the nics by mac address when creating the
> names. It need only run when the z70 file does not exist.
> I presume this would produce consistent results in most cases
> and it feels technically feasible; although I am not
> fully qualified to make that judgment.
>
> Rather that put the onus on udev to make the above
> change Dell could just run a little program at first
> boot that mungs the z70 file as desired. (It could then
> force a reboot; I forget if this would be needed.)
> I imagine Dell boots the boxes once at the factory,

nearly all dell systems running linux in the world were not
factory-installed with that os. this isn't something i can simply
patch in our factories. it needs to be fixed as far upstream as
possible.

> but if not then the user has to suffer with a longer
> boot process at first boot. because this is driven
> by dell, dell would know exactly what nic has what
> name. and dell knows what nics are on the mobo and
> what are not, and so can control the mac address sort
> order as desired.

well, there is no "mac address sort" anywhere. (nor is that really a
good algorithm to use).


> The other solution that screams out at me is to ditch
> those legacy BIOSes and go to something like LinuxBIOS.
> Again, I'm not really qualified, but it sure feels like
> there's an answer in this approach.

It's not a BIOS problem. BIOS can inform the OS of what it thinks
about hardware location, names, etc. And our PowerEdge (9G and newer)
servers do - using SMBIOS 2.6 standard features we added (types 9, 10,
and 41) to the specification - exactly to allow such. Now something
needs to use that information. That something today is biosdevname,
which could be more cleanly integrated with udev.

> The other point that struck me was that sometimes, it seems,
> users want persistence in the naming of their network devices
> and sometimes they want device names based on bus position.

indeed

> The sucky thing is that symlinks and nics don't mix well
> and so it seems impossible to satisfy both the above
> requirements at the same time. This is an area that
> IMHO could be better addressed by the Linux community.

correct.

--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & http://www.dell.com/linux

2009-03-24 18:12:36

by Bill Nottingham

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

Matt Domsch ([email protected]) said:
> > Now consider USB. Would the device name change depending on which USB
> > port you plugged it into? Or is USB just a single slot, in which case
> > what happens when you have two USB ethernet devices?
> >
> > The Apple USB Ethernet device in my iPhone is not the USB Wireless
> > adapter I own, both have very different networking configurations.
>
> we would obviously need a solution. eth_usb_{something} perhaps.

Right, but having biosdevname chase each new bus that comes along
sounds iffy. I'd prefer /dev/net/by-name symlinks, if at all
possible. But that's a lot of code that I'm not prepared to write.

Bill

2009-03-24 18:21:13

by Scott James Remnant

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, 2009-03-24 at 14:12 -0400, Bill Nottingham wrote:

> Matt Domsch ([email protected]) said:
> > > Now consider USB. Would the device name change depending on which USB
> > > port you plugged it into? Or is USB just a single slot, in which case
> > > what happens when you have two USB ethernet devices?
> > >
> > > The Apple USB Ethernet device in my iPhone is not the USB Wireless
> > > adapter I own, both have very different networking configurations.
> >
> > we would obviously need a solution. eth_usb_{something} perhaps.
>
> Right, but having biosdevname chase each new bus that comes along
> sounds iffy. I'd prefer /dev/net/by-name symlinks, if at all
> possible. But that's a lot of code that I'm not prepared to write.
>
Not to mention that All The World Is Not x86

Scott
--
Scott James Remnant
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-03-24 18:49:52

by David Lang

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, 24 Mar 2009, Matt Domsch wrote:

> You may recall http://lkml.org/lkml/2006/9/29/268, wherein I described
> network device enumeration and naming challenges, and several possible
> fixes. Of these, Fix #1 (fix the PCI device list to be sorted
> breadth-first) has been implemented in the kernel, and Fix #3 (system
> board routing rules) have been implemented on Dell PowerEdge 10G and
> 11G servers (11G begin selling RSN).
>
> However, these have not been completely satisfactory. In particular,
> it keeps getting harder and harder to route PCI-Express lanes to
> guarantee the same ordering between a depth-first and breadth-first
> walk, and it turns out, that isn't sufficient anyhow.
>
>
> Problem: Users expect on-motherboard NICs to be named eth0..ethN. This can be difficult to achieve.

I dispute this statement.

I have several hundred servers that have the on-motherboard NICs as the
last ones.

anyone who's been making the assumption you describe will have been
running into problems for many years.

it's just not a valid assumption.

> Ethernet device names are initially assigned by the kernel, and may be
> changed by udev or nameif in userspace. The initial name assigned by
> the kernel is in monotonically increasing order, starting with eth0.
> In this instance, the enumeration directly leads to an assigned name.
>
> Complications:
>
> 1) Devices are discovered, and presented to the kernel for name
> assignment, based on several factors:
>
> a) the kernel hotplug mechanism emits events for udev to catch, to
>
>
> b) udev may run modprobes in parallel. It guarantees that the
>
> To get any consistent ordering now, one of two things must
> happen:
>
> i) drivers must be loaded before udev begins loading drivers
> (either very early in initscripts, or in the inital ramdisk).
> ii) something must "fix up" the kernel-assigned names after
> udev's modprobes complete. udev does this as well.
>
> 2) udev may have rules to change the device names. This is most often
> seen in the '70-persistent-net.rules' file. Here we have
> additional challenges:
>
> a) this does not exist the first time devices are discovered; the
>
> b) it introduces state (MAC addresses) to the system, on a system
>
> c) udev may not always be able to change a device's name. If udev
>

not everyone uses udev. I compile the nessasary drivers into the kernel
and don't need udev to get interfaces.

> 3) End users have the (reasonable?) expectation that NIC ports

as noted above, only some users have this unrealistic expectation.

> 4) When adding a network card to an existing system, what should the
> ports on the new card be named? If it is added, they will be named
> ethN+1... above the existing named cards. This means a (new)
> add-in card in PCI slot 3 may have ports named eth5 and eth6, while
> an add-in card in PCI slot 5 may have ports named eth2 and eth3.
> This is not intuitive.

this approach causes serious problems in a few cases, including

1. a NIC goes bad and you replace it. now all the configs change

2. you reinstall a box and it's interface names change.

David Lang

2009-03-24 18:52:27

by David Lang

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, 24 Mar 2009, Dan Williams wrote:

> On Tue, 2009-03-24 at 17:21 +0100, Patrick McHardy wrote:
>> Matt Domsch wrote:
>>> 2) udev may have rules to change the device names. This is most often
>>> seen in the '70-persistent-net.rules' file. Here we have
>>> additional challenges:
>>>
>>> ...
>>>
>>> c) udev may not always be able to change a device's name. If udev
>>> uses the kernel assignment namespace (ethN), then a rename of
>>> eth0->eth1 may require renaming eth1->eth0 (or something else).
>>> Udev operates on a single device instance at a time, it becomes
>>> difficult to switch names around for multiple devices, within
>>> the single namespace.
>>
>> I would classify this as a bug, especially the fact that udev doesn't
>> undo a failed rename, so you end up with ethX_rename. Virtual devices
>> using the same MAC address trigger this reliably unless you add
>> exceptions to the udev rules.
>
> Any particular reason the MAC addresses are the same? This came up a
> while ago with the 'dnet' device in the thread "Dave DNET ethernet
> controller".
>
> If the MAC address isn't a UUID for the device, then *what* is?

I have seen systems (I think they were Sun boxes) where the _machine_ had
a MAC address, and it used that same MAC on all interfaces.

this is convienient for some things, but not for others.

what's unique and reproducable is the discovery order

David Lang

2009-03-24 19:22:55

by Matt Domsch

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, Mar 24, 2009 at 11:49:26AM -0700, [email protected] wrote:
> On Tue, 24 Mar 2009, Matt Domsch wrote:
>
> >You may recall http://lkml.org/lkml/2006/9/29/268, wherein I described
> >network device enumeration and naming challenges, and several possible
> >fixes. Of these, Fix #1 (fix the PCI device list to be sorted
> >breadth-first) has been implemented in the kernel, and Fix #3 (system
> >board routing rules) have been implemented on Dell PowerEdge 10G and
> >11G servers (11G begin selling RSN).
> >
> >However, these have not been completely satisfactory. In particular,
> >it keeps getting harder and harder to route PCI-Express lanes to
> >guarantee the same ordering between a depth-first and breadth-first
> >walk, and it turns out, that isn't sufficient anyhow.
> >
> >
> >Problem: Users expect on-motherboard NICs to be named eth0..ethN. This
> >can be difficult to achieve.
>
> I dispute this statement.
>
> I have several hundred servers that have the on-motherboard NICs as the
> last ones.
>
> anyone who's been making the assumption you describe will have been
> running into problems for many years.
>
> it's just not a valid assumption.

I agree it's not a valid assumption.

People seem to want two things with names:
1) that devices be named deterministically
2) that the determinism doesn't change on a per-platform or
per-configuration-of-a-platform basis.

This tends to mean they want the onboard devices named first, then the
add-in devices named. But not necessarily. I would hope to have a
deterministic naming method that would work for most people by
default, but that could be changed (in userspace) as necessary.

> >4) When adding a network card to an existing system, what should the
> > ports on the new card be named? If it is added, they will be named
> > ethN+1... above the existing named cards. This means a (new)
> > add-in card in PCI slot 3 may have ports named eth5 and eth6, while
> > an add-in card in PCI slot 5 may have ports named eth2 and eth3.
> > This is not intuitive.
>
> this approach causes serious problems in a few cases, including
>
> 1. a NIC goes bad and you replace it. now all the configs change
>
> 2. you reinstall a box and it's interface names change.

Right. These cases are only deterministic because they start from a
known state; change or remove that state, and you're back to
non-deterministic.

--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & http://www.dell.com/linux

2009-03-24 21:02:00

by Alan

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

> this is convienient for some things, but not for others.
>
> what's unique and reproducable is the discovery order

Not in the case of things like USB...

2009-03-24 22:58:23

by David Miller

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

From: Matt Domsch <[email protected]>
Date: Tue, 24 Mar 2009 10:46:17 -0500

> Problem: Users expect on-motherboard NICs to be named eth0..ethN.
> This can be difficult to achieve.

I learned a long time ago that eth0 et al. have zero meaning.

If the system firmware folks gave us topology information with respect
to these things, we could export something that tools such as
NetworkManager, iproute2, etc. could use.

For example, if we were told that PCI device "domain:bus:dev:fn" has
string label "Onboard Ethernet 0" then we could present that to the
user.

Changing how the actual network device name is determined is going to
have zero traction.

So, please, put mapping tables into the ACPI or similar and then
programs can go:

for_each_network_device(name) {
fd = open(name);
label = get_system_label(fd, name);
present_to_user(label, name);
}

This "get_system_label()" thing can be an ethtool ioctl, some
rtnetlink call, or similar. In the kernel, a generic routine would
exist for major bus types to make the mapping translation, and drivers
would call these.

For PCI it might take the PCI device pointer and try to fish
out a string from the ACPI layer.

For OpenFirmware we might just simply give the full device path,
or a matching device alias name.

That's the only model which allows a smooth transition and
no major infrastructure changes.

I guess it's easier to spew about MAC addresses and other
irrelevant topics than try to solve this problem properly. :-)

2009-03-24 23:51:23

by Greg KH

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, Mar 24, 2009 at 09:02:14PM +0000, Alan Cox wrote:
> > this is convienient for some things, but not for others.
> >
> > what's unique and reproducable is the discovery order
>
> Not in the case of things like USB...

Or even PCI.

/me pats his laptop that reassigns PCI device ids randomly every 3rd or so boot.

2009-03-25 20:22:58

by Chris Friesen

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

David Miller wrote:
> From: Matt Domsch <[email protected]>
> Date: Tue, 24 Mar 2009 10:46:17 -0500
>
>> Problem: Users expect on-motherboard NICs to be named eth0..ethN.
>> This can be difficult to achieve.
>
> I learned a long time ago that eth0 et al. have zero meaning.
>
> If the system firmware folks gave us topology information with respect
> to these things, we could export something that tools such as
> NetworkManager, iproute2, etc. could use.

<snip>

> I guess it's easier to spew about MAC addresses and other
> irrelevant topics than try to solve this problem properly. :-)

What about things like USB network adapters where the topology is not
fixed? Presumably we would want to use some sort of unique identifier,
and the MAC comes to mind. Of course, then you run into the problem of
how to deal with duplicate MACs.

Chris

2009-03-26 16:40:15

by Matt Domsch

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, Mar 24, 2009 at 03:57:56PM -0700, David Miller wrote:
> From: Matt Domsch <[email protected]>
> Date: Tue, 24 Mar 2009 10:46:17 -0500
>
> > Problem: Users expect on-motherboard NICs to be named eth0..ethN.
> > This can be difficult to achieve.
>
> I learned a long time ago that eth0 et al. have zero meaning.
>
> If the system firmware folks gave us topology information with respect
> to these things, we could export something that tools such as
> NetworkManager, iproute2, etc. could use.
>
> For example, if we were told that PCI device "domain:bus:dev:fn" has
> string label "Onboard Ethernet 0" then we could present that to the
> user.
>
> Changing how the actual network device name is determined is going to
> have zero traction.
>
> So, please, put mapping tables into the ACPI or similar and then
> programs can go:
>
> for_each_network_device(name) {
> fd = open(name);
> label = get_system_label(fd, name);
> present_to_user(label, name);
> }


Your wish is my command. DMTF SMBIOS 2.6 specification
http://www.dmtf.org/standards/smbios/ contains changes which provide
this for PCI devices.

Specifically, Type 9 ("System Slots") was extended to include the PCI
domain/bus/device/function for each slot. Type 10 ("On Board Devices
Information") could not be extended, thus it was deprecated, and new
Type 41 ("Onboard Devices Extended Information") was created to be
extensible and now includes PCI domain/bus/device/function
information. Both Type 9 and Type 41 include a String field which
hopefully has a more descriptive value, such as "Onboard Ethernet
Broadcom 5808 NIC 1" in the case of some Dell servers.

Shipping Dell 10G (and very soon 11G) server BIOS includes this
information. biosdevname can use this to report device names. Some
HP systems have a vendor-specific SMBIOS extension to provide a
similar mapping; biosdevname can report this as well.

> This "get_system_label()" thing can be an ethtool ioctl, some
> rtnetlink call, or similar. In the kernel, a generic routine would
> exist for major bus types to make the mapping translation, and drivers
> would call these.
>
> For PCI it might take the PCI device pointer and try to fish
> out a string from the ACPI layer.
>
> For OpenFirmware we might just simply give the full device path,
> or a matching device alias name.
>
> That's the only model which allows a smooth transition and
> no major infrastructure changes.

While I'd be happy for NetworkManager to present these SMBIOS-provided
human-parsable names when available, the names aren't terribly
meaningful in a programatic fashion. The users I've encountered are
looking for a programatic way to say:

The first LOM is my management/admin NIC. The second LOM is my bulk
traffic NIC. The first add-in card is my backup NIC.

meaning we still need a translation from "how I want to use a NIC" to
"which NIC should I plug the cable into". The SMBIOS names don't
completely solve this.

Hence my desire of having a way to have multiple alternate names for
the same interface. One such name would be the full SMBIOS string.
Another would be a bus topology name. A third could be a "how do I use
it" name. Analogous to devices represented in /dev using symlinks for
these other names. I don't care if it's symlinks in /dev or some
other mechanism.

Thanks,
Matt

--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & http://www.dell.com/linux

2009-03-26 20:19:24

by Dan Williams

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Thu, 2009-03-26 at 11:39 -0500, Matt Domsch wrote:
> On Tue, Mar 24, 2009 at 03:57:56PM -0700, David Miller wrote:
> > From: Matt Domsch <[email protected]>
> > Date: Tue, 24 Mar 2009 10:46:17 -0500
> >
> > > Problem: Users expect on-motherboard NICs to be named eth0..ethN.
> > > This can be difficult to achieve.
> >
> > I learned a long time ago that eth0 et al. have zero meaning.
> >
> > If the system firmware folks gave us topology information with respect
> > to these things, we could export something that tools such as
> > NetworkManager, iproute2, etc. could use.
> >
> > For example, if we were told that PCI device "domain:bus:dev:fn" has
> > string label "Onboard Ethernet 0" then we could present that to the
> > user.
> >
> > Changing how the actual network device name is determined is going to
> > have zero traction.
> >
> > So, please, put mapping tables into the ACPI or similar and then
> > programs can go:
> >
> > for_each_network_device(name) {
> > fd = open(name);
> > label = get_system_label(fd, name);
> > present_to_user(label, name);
> > }
>
>
> Your wish is my command. DMTF SMBIOS 2.6 specification
> http://www.dmtf.org/standards/smbios/ contains changes which provide
> this for PCI devices.
>
> Specifically, Type 9 ("System Slots") was extended to include the PCI
> domain/bus/device/function for each slot. Type 10 ("On Board Devices
> Information") could not be extended, thus it was deprecated, and new
> Type 41 ("Onboard Devices Extended Information") was created to be
> extensible and now includes PCI domain/bus/device/function
> information. Both Type 9 and Type 41 include a String field which
> hopefully has a more descriptive value, such as "Onboard Ethernet
> Broadcom 5808 NIC 1" in the case of some Dell servers.
>
> Shipping Dell 10G (and very soon 11G) server BIOS includes this
> information. biosdevname can use this to report device names. Some
> HP systems have a vendor-specific SMBIOS extension to provide a
> similar mapping; biosdevname can report this as well.
>
> > This "get_system_label()" thing can be an ethtool ioctl, some
> > rtnetlink call, or similar. In the kernel, a generic routine would
> > exist for major bus types to make the mapping translation, and drivers
> > would call these.
> >
> > For PCI it might take the PCI device pointer and try to fish
> > out a string from the ACPI layer.
> >
> > For OpenFirmware we might just simply give the full device path,
> > or a matching device alias name.
> >
> > That's the only model which allows a smooth transition and
> > no major infrastructure changes.
>
> While I'd be happy for NetworkManager to present these SMBIOS-provided
> human-parsable names when available, the names aren't terribly
> meaningful in a programatic fashion. The users I've encountered are
> looking for a programatic way to say:
>
> The first LOM is my management/admin NIC. The second LOM is my bulk
> traffic NIC. The first add-in card is my backup NIC.

nm-applet could support some sort of "named" adapters, though I'd rather
have this done with udev rules (or something like that) so that the
NIC's common name would be consistent in both the CLI and in the GUI.

The only reason nm-applet does what it does now (pulling VID/PID and
dropping stupid words like "Corporation") is so the user has *some* clue
what NIC they are about to touch; using "eth0" and "eth1" and "eth2"
isn't very helpful. But the distinction between "Intel Gigabit
Ethernet" and "D-Link 10/100 USB Adapter" is quite a bit easier to grasp
at a glance.

Dan

> meaning we still need a translation from "how I want to use a NIC" to
> "which NIC should I plug the cable into". The SMBIOS names don't
> completely solve this.
>
> Hence my desire of having a way to have multiple alternate names for
> the same interface. One such name would be the full SMBIOS string.
> Another would be a bus topology name. A third could be a "how do I use
> it" name. Analogous to devices represented in /dev using symlinks for
> these other names. I don't care if it's symlinks in /dev or some
> other mechanism.
>
> Thanks,
> Matt
>

2009-03-26 20:20:20

by Dan Williams

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Wed, 2009-03-25 at 14:22 -0600, Chris Friesen wrote:
> David Miller wrote:
> > From: Matt Domsch <[email protected]>
> > Date: Tue, 24 Mar 2009 10:46:17 -0500
> >
> >> Problem: Users expect on-motherboard NICs to be named eth0..ethN.
> >> This can be difficult to achieve.
> >
> > I learned a long time ago that eth0 et al. have zero meaning.
> >
> > If the system firmware folks gave us topology information with respect
> > to these things, we could export something that tools such as
> > NetworkManager, iproute2, etc. could use.
>
> <snip>
>
> > I guess it's easier to spew about MAC addresses and other
> > irrelevant topics than try to solve this problem properly. :-)
>
> What about things like USB network adapters where the topology is not
> fixed? Presumably we would want to use some sort of unique identifier,
> and the MAC comes to mind. Of course, then you run into the problem of
> how to deal with duplicate MACs.

USB devices do have a serial number field in the descriptors, but that
only sometimes gets populated with sensible values. More often than not
it's just zeros. But worth checking if the MAC isn't set yet.

Dan

2009-03-27 16:07:27

by Len Brown

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy


> > > So, please, put mapping tables into the ACPI or similar

ACPI added _PLD (Physical Device Location) back in 3.0, ISTR.
However, searching my archives, I have yet to see a single instance
of its use in the field.

ACPI also supplies the slot number stuff, which is exported via
the existing pci_slot driver.

cheers,
Len Brown, Intel Open Source Technology Center

2009-03-31 14:23:17

by Kurt Van Dijck

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

My idea as a user, having configured some servers:

On Tue, Mar 24, 2009 at 05:46:17PM +0200, Matt Domsch wrote:
>
> Problem: Users expect on-motherboard NICs to be named eth0..ethN. This can be difficult to achieve.

with kernel point of view, there should be no preference. If users
expect some numbering, I believe udev provides all the tools.
>
> Ethernet device names are initially assigned by the kernel, and may be
> changed by udev or nameif in userspace. The initial name assigned by
> the kernel is in monotonically increasing order, starting with eth0.
> In this instance, the enumeration directly leads to an assigned name.
the problem here is the monotonic increasing order. I never rename ethX
back to the monotonic ethX numbering. IMHO, renaming eth0 to eth1 sounds
redundant.
I rename ethx to lan, wan, wlan, remote, lan0, lan1, ...
This naming _cannot_ conflict.
>
>
> To ease this complication, SMBIOS 2.6 includes a mechanism for
> BIOS to specify its expected ordering of devices, for naming
> purposes. Tools such as biosdevname use this information.
I'd preferrably not rely on bios tools, not every system has a (stable) bios.
>
>

2009-04-09 14:58:35

by Matt Domsch

[permalink] [raw]
Subject: Re: Network Device Naming mechanism and policy

On Tue, Mar 24, 2009 at 03:57:56PM -0700, David Miller wrote:
> From: Matt Domsch <[email protected]>
> Date: Tue, 24 Mar 2009 10:46:17 -0500
>
> > Problem: Users expect on-motherboard NICs to be named eth0..ethN.
> > This can be difficult to achieve.
>
> I learned a long time ago that eth0 et al. have zero meaning.
>
> If the system firmware folks gave us topology information with respect
> to these things, we could export something that tools such as
> NetworkManager, iproute2, etc. could use.
>
> For example, if we were told that PCI device "domain:bus:dev:fn" has
> string label "Onboard Ethernet 0" then we could present that to the
> user.
>
> Changing how the actual network device name is determined is going to
> have zero traction.


David, would you be opposed to the additional device names being done
as device nodes in userspace, as several people suggested?

/sys/devices/*/net/ifindex already exports the netlink device index.
It would be trivial to add a /sys/devices/*/net/dev file, with
<major>:<minor> for a device, where <minor> = ifindex.

Then udev could then maintain /dev/net/by-{mac,path,...} as symlinks
to /dev/net/$kernelname.

Tools such as iproute's 'ip' could then be extended to look up their
'dev' argument by /dev path, resolve the symlink to name, get the device node, and
open the socket with the minor number / index (as normal).

Thanks,
Matt

--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & http://www.dell.com/linux