Subject: Re: network regression: cannot rename netdev twice

On Tue, 31 Jan 2012, Kay Sievers wrote:
> Please make sure nothing tries to swap netif names in userspace. We
> have given up that approach, because it is far too fragile to
> temporary rename devices to be able to swap the names, and race
> against the loading of new kernel network drivers at the same time.

That's a damn fair reason, but the loss of that functionality could cause
trouble. In fact, at first glance, to me it looks like this has a large
potential for unleashing untold pain and suffering in the sysadmin ranks
unless early userspace can emulate it somehow.

Is it possible to configure the kernel to use something other than "eth#" as
its initial namespace for netif names? Or is there some other way to get
eth1 to be what you need eth1 to be during userland boot?

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh


2012-02-06 20:03:53

by Kay Sievers

[permalink] [raw]
Subject: Re: network regression: cannot rename netdev twice

On Sat, Feb 4, 2012 at 03:14, Henrique de Moraes Holschuh
<[email protected]> wrote:
> On Tue, 31 Jan 2012, Kay Sievers wrote:
>> Please make sure nothing tries to swap netif names in userspace. We
>> have given up that approach, because it is far too fragile to
>> temporary rename devices to be able to swap the names, and race
>> against the loading of new kernel network drivers at the same time.
>
> That's a damn fair reason, but the loss of that functionality could cause
> trouble.  In fact, at first glance, to me it looks like this has a large
> potential for unleashing untold pain and suffering in the sysadmin ranks
> unless early userspace can emulate it somehow.
>
> Is it possible to configure the kernel to use something other than "eth#" as
> its initial namespace for netif names?  Or is there some other way to get
> eth1 to be what you need eth1 to be during userland boot?

I don't think there is a sane way to do that. Someone could add a
kernel command line parameter to switch ethX in the kernel to
something else, and create custom udev rules which match on device
properties and apply configured names which are ethX again. But for
all that, there will be no generally available support in common base
system tools, and we absolutely do not recommend anybody doing that.

Udev will not provide any help for that any more, not for automatic
device name reservation from a hotplug path, not for device name swaps
in the kernel namespace. It will only be allowed to rename devices to
a namespace that does not clash with the kernel's one.

People should use biosdevname's pci-slot names, or the on-board labels
names like DELL does for configuration-less stable names, or use
manually configured names 'internal', 'external' ,'dmz', 'vpn' and so
on.

I think we should stop pretending we can solve problems, resulting
from simple enumeration depending on device-discovery order. These
numbers can never be stable, can never reliably work in the reality we
are working with.

It's time to leave these false promises behind us and move on and that
means, no stable ethX names anymore.

Kay

Subject: Re: network regression: cannot rename netdev twice

On Mon, 06 Feb 2012, Kay Sievers wrote:
> On Sat, Feb 4, 2012 at 03:14, Henrique de Moraes Holschuh
> <[email protected]> wrote:
> > Is it possible to configure the kernel to use something other than "eth#" as
> > its initial namespace for netif names? ?Or is there some other way to get
> > eth1 to be what you need eth1 to be during userland boot?
>
> I don't think there is a sane way to do that. Someone could add a
> kernel command line parameter to switch ethX in the kernel to
> something else, and create custom udev rules which match on device
> properties and apply configured names which are ethX again. But for
> all that, there will be no generally available support in common base
> system tools, and we absolutely do not recommend anybody doing that.

What sort of impact analysis on userspace was done about this change?

Nobody in his right mind would go back to the dark ages of uncontrolled
ifnames. You're effectively forcing everybody with a clue away from the
eth# namespace.

Just to be very clear: the impact of this is the need to change the
interface names on potentially millions of lines of firewall rules and
scripts out there, as well as tracking down stuff (mostly scripts) that
special-cases the eth prefix.

Is there a really good reason why we cannot have a way to move the
kernel away from the eth# namespace at boot (through a kernel parameter,
maybe with the default namespace set at compile time), AND keep the
"common base system tools" support to assign ifname based on MAC
addresses that we have right now?

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2012-02-08 03:50:37

by Kay Sievers

[permalink] [raw]
Subject: Re: network regression: cannot rename netdev twice

On Wed, Feb 8, 2012 at 03:00, Henrique de Moraes Holschuh
<[email protected]> wrote:
> On Mon, 06 Feb 2012, Kay Sievers wrote:
>> On Sat, Feb 4, 2012 at 03:14, Henrique de Moraes Holschuh
>> <[email protected]> wrote:
>> > Is it possible to configure the kernel to use something other than "eth#" as
>> > its initial namespace for netif names?  Or is there some other way to get
>> > eth1 to be what you need eth1 to be during userland boot?
>>
>> I don't think there is a sane way to do that. Someone could add a
>> kernel command line parameter to switch ethX in the kernel to
>> something else, and create custom udev rules which match on device
>> properties and apply configured names which are ethX again. But for
>> all that, there will be no generally available support in common base
>> system tools, and we absolutely do not recommend anybody doing that.
>
> What sort of impact analysis on userspace was done about this change?

None. It will just not be supported for new setups. Existing ones will
do what they always did.

> Nobody in his right mind would go back to the dark ages of uncontrolled
> ifnames.  You're effectively forcing everybody with a clue away from the
> eth# namespace.

Yes. It's a game we have lost and we will not win in the future. I
gave up, and I warn everybody who think it's simple to manage.

> Just to be very clear: the impact of this is the need to change the
> interface names on potentially millions of lines of firewall rules and
> scripts out there, as well as tracking down stuff (mostly scripts) that
> special-cases the eth prefix.

Yeah, and for good, ethX is a pretty much random kernel name, and I
personally will no longer work on conceptually broken infrastructure
that can never deliver what it seems to promise. In the longer run,
tools need to be fixed to automatically handle changing names, or not
care about the names at all, or names need to be explicitly set up
outside the ethX namespace to be predictable.

After years of working in that area I will stop to work on these hacks
to promise stable ethX names. It was just wrong, like enumerations
always are in hotplug setups.

> Is there a really good reason why we cannot have a way to move the
> kernel away from the eth# namespace at boot (through a kernel parameter,
> maybe with the default namespace set at compile time),

Could work, but I don't think it is worth. Simple enumeration, and
automatic persistent on-disk device name reservation in a flat
number-range is just a very flawed concept. I'm not interested in
working on that, but that surely should not stop anybody from trying
and providing tools that can do that.

> AND keep the
> "common base system tools" support to assign ifname based on MAC
> addresses that we have right now?

Not provided by udev's default setup, which did persistent name
reservation in the device hotplug path. It is already disabled and
will be entirely removed from the source tree some day. Other tools
can still try to provide that. But I declare that model as officially
failed and udev will not even try anything like that anymore.

People who need predictable interface names should just manually
configure custom/descriptive names, or names which are reliably
derived from the hardware, like firmware-provided names or the pci
slot number.

Kay

2012-02-08 06:43:17

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: network regression: cannot rename netdev twice

On Wed, 08 Feb 2012 04:50:15 +0100, Kay Sievers said:

> After years of working in that area I will stop to work on these hacks
> to promise stable ethX names. It was just wrong, like enumerations
> always are in hotplug setups.

So (real world case) I've got a server that's got a 1G ethernet connected to
the public net, a 1G ethernet that's a cluster management network, and
a 10G ethernet that connects to our HPC clusters.

And I want to add iptables rules that distinguish based on interface. Currently
I can nail the management net to eth0, the public net to eth1, and the 10G to
eth2, and then just add "-i eth1" or whatever in the iptables ruleset.

I really don't care if the 0/1/2 move around - but if we're not having nailed-down
interface names, what will take the place of '-i ethN' in iptables?

> People who need predictable interface names should just manually
> configure custom/descriptive names, or names which are reliably
> derived from the hardware, like firmware-provided names or the pci
> slot number.

Or is this sort of thing in /etc/udev/rules.d/70-persistent-net.rules
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:25:90:0b:f2:80", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
what you are trying to move to, and my systems are already onboard and
I should just move along, nothing to see here? ;)


Attachments:
(No filename) (865.00 B)

2012-02-08 10:57:40

by Kay Sievers

[permalink] [raw]
Subject: Re: network regression: cannot rename netdev twice

On Wed, Feb 8, 2012 at 07:42, <[email protected]> wrote:
> On Wed, 08 Feb 2012 04:50:15 +0100, Kay Sievers said:
>
>> After years of working in that area I will stop to work on these hacks
>> to promise stable ethX names. It was just wrong, like enumerations
>> always are in hotplug setups.
>
> So (real world case) I've got a server that's got a 1G ethernet connected to
> the public net, a 1G ethernet that's a cluster management network, and
> a 10G ethernet that connects to our HPC clusters.
>
> And I want to add iptables rules that distinguish based on interface. Currently
> I can nail the management net to eth0, the public net to eth1, and the 10G to
> eth2, and then just add "-i eth1" or whatever in the iptables ruleset.
>
> I really don't care if the 0/1/2 move around - but if we're not having nailed-down
> interface names, what will take the place of '-i ethN' in iptables?
>
>> People who need predictable interface names should just manually
>> configure custom/descriptive names, or names which are reliably
>> derived from the hardware, like firmware-provided names or the pci
>> slot number.
>
> Or is this sort of thing in /etc/udev/rules.d/70-persistent-net.rules
> SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:25:90:0b:f2:80", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
> what you are trying to move to, and my systems are already onboard and
> I should just move along, nothing to see here? ;)

Yeah, that's what we did in the past. It works fine if you never have
to swap names like eth0 and eth1, with need to free one of the the
names with a temporary rename.

If another device is added by a different kernel module, or just a USB
network device is already plugged-in at bootup time, the parallel
loading of drivers might cause the kernel to create a new eth0 or eth1
just in the moment we have the temporary rename active and we want to
swap the names.

That model is just entirely flawed and will never work reliably
without creating an even bigger mess we already have, by requiring
complex retry loops across multiple devices, or having global locks
including the kernel's device name allocation logic.

Let's just move on and stop pretending we want or we can solve these
problems. Simple device enumerations in hotplug setups can by their
very definition not work in a predictable way, we should never have
tried to mess around here, and just moved on to something that has at
least the potential to work.

Kay

2012-02-08 20:07:23

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: network regression: cannot rename netdev twice

On Wed, 08 Feb 2012 11:57:18 +0100, Kay Sievers said:
> On Wed, Feb 8, 2012 at 07:42, <[email protected]> wrote:

> > Or is this sort of thing in /etc/udev/rules.d/70-persistent-net.rules
> > SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:25:90:0b:f2:80", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
> > what you are trying to move to, and my systems are already onboard and
> > I should just move along, nothing to see here? ;)
>
> Yeah, that's what we did in the past. It works fine if you never have
> to swap names like eth0 and eth1, with need to free one of the the
> names with a temporary rename.

Well, if I had my druthers, I'd stick name="net-mgt", "net-pub", and "net-10g"
in the udev rules, and not care about 1/2/3 and race conditions, because
meaningful names are easier to not screw up (just last week found a system that
had eth1 and eth2 reversed in some iptables rules, wouldn't have happened if
they were -mgt and -pub).

Only thing stopping me is getting iptables to accept '-i net-10g', and the
distro /etc/sysconfig/network scripts like ifup and ifdown playing nice....

So it sounds like what I want as a sysadmin is the same thing you want
as a maintainer...



Attachments:
(No filename) (865.00 B)

2012-02-08 20:28:01

by Stephen Hemminger

[permalink] [raw]
Subject: Re: network regression: cannot rename netdev twice

Our customers would prefer network device names of the form
"Ethernet0/0" or "ge-0/0/0" (but I said no...)


Attachments:
signature.asc (836.00 B)

2012-02-08 23:49:18

by Kay Sievers

[permalink] [raw]
Subject: Re: network regression: cannot rename netdev twice

On Wed, Feb 8, 2012 at 21:06, <[email protected]> wrote:
> On Wed, 08 Feb 2012 11:57:18 +0100, Kay Sievers said:
>> On Wed, Feb 8, 2012 at 07:42,  <[email protected]> wrote:
>
>> > Or is this sort of thing in /etc/udev/rules.d/70-persistent-net.rules
>> > SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:25:90:0b:f2:80", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
>> > what you are trying to move to, and my systems are already onboard and
>> > I should just move along, nothing to see here? ;)
>>
>> Yeah, that's what we did in the past. It works fine if you never have
>> to swap names like eth0 and eth1, with need to free one of the the
>> names with a temporary rename.
>
> Well, if I had my druthers, I'd stick name="net-mgt", "net-pub", and "net-10g"
> in the udev rules, and not care about 1/2/3 and race conditions, because
> meaningful names are easier to not screw up (just last week found a system that
> had eth1 and eth2 reversed in some iptables rules, wouldn't have happened if
> they were -mgt and -pub).
>
> Only thing stopping me is getting iptables to accept '-i net-10g', and the
> distro /etc/sysconfig/network scripts like ifup and ifdown playing nice....
>
> So it sounds like what I want as a sysadmin is the same thing you want
> as a maintainer...

Yeah, that sounds very much like it is.

I want to push some responsibility to the admin, do less automagic,
and personally want to be less responsible for all the unintended
screw-up the automagic is causing everywhere.

Sure, the intention to keep the names like they always have been was
good, but a good intention and a broken model to deliver it, and
continue to pretend we can solve it, is the worst things we can do. :)

Kay