2008-10-22 15:23:31

by Benjamin Thery

[permalink] [raw]
Subject: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries


Support for network namespaces in mainline is pretty complete for
some time now, but there is still this issue with sysfs that prevents
more people to use it easily.

Reminder for those not aware of the netns/sysfs issue:

With network namespaces, the kernel must be able to support net devices
with the same name in different network namespaces: the most obvious
example being the loopback device, which exists in every namespace.
The remaining place where this doesn't work yet is sysfs.

In the last 12 months, Eric Biederman proposed different approaches
to support this and sent several patchsets to implement what he calls
"sysfs tagged directories". But unfortunately, there is still no
agreement on the patchset and its implementation.

See last round of comments there:
http://thread.gmane.org/gmane.linux.kernel/735612/focus=740050

So, currently testing network namespaces on a mainline kernel is a
pain and involves either to disable sysfs completely (argh) or to find
and manually apply Eric's latest patchset (was in gregkh's tree for a
short time, but unfortunately it was dumped out a few a weeks ago).


This patchset explores an alternative suggested by Serge Hallyn
to *temporarily* fix this issue. It introduces the modifications
needed to register in sysfs, the network devices belonging to child
network namespaces with a suffix appended to their name to avoid
potential conflicts.

http://thread.gmane.org/gmane.linux.kernel/735612/focus=741757

Network devices from the initial network namespace are untouched.
Their representation in sysfs (/sys/class/net/, ...) is unchanged.

Network devices from sub-network namespaces appear in sysfs
with a name that looks like this: device_name@netns_id
eg: lo@3, eth0@4e

See last patch of the series for the details.

Then, if needed in the child network namespace, we can filter
/sys/class/net contents with, for example:

* mount -t tmpfs /sys/class/net
* and manually link the right devices from /sys/devices/virtual/net
(ln -s ../../devices/virtual/net/lo@1 lo)

This is less elegant than Eric's approach, but is quite simple and
doesn't touch sysfs core code.

This patch applies on top of net-next-2.6.

Benjamin

--


2008-10-22 20:04:52

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries

Benjamin Thery <[email protected]> writes:

> Support for network namespaces in mainline is pretty complete for
> some time now, but there is still this issue with sysfs that prevents
> more people to use it easily.

Ben your patchset is completely inappropriate.

Temporarily adding elements to the ABI that we intend to remove
is not a proper solution to this problem.

That user space visible ida you add is a namespace identifier that breaks
nested containers and migration. It is very very very wrong.

Eric

2008-10-22 20:20:29

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries

On Wed, Oct 22, 2008 at 05:21:44PM +0200, Benjamin Thery wrote:
> Network devices from sub-network namespaces appear in sysfs
> with a name that looks like this: device_name@netns_id
> eg: lo@3, eth0@4e

How does the default udev rules as shipped by most distros handle the
renaming of the network device if the MAC address is duplicated like it
will be for these eth devices?

thanks,

greg k-h

2008-10-22 20:31:24

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries

Quoting Eric W. Biederman ([email protected]):
> Benjamin Thery <[email protected]> writes:
>
> > Support for network namespaces in mainline is pretty complete for
> > some time now, but there is still this issue with sysfs that prevents
> > more people to use it easily.
>
> Ben your patchset is completely inappropriate.
>
> Temporarily adding elements to the ABI that we intend to remove
> is not a proper solution to this problem.
>
> That user space visible ida you add is a namespace identifier that breaks
> nested containers and migration. It is very very very wrong.

I disagree (not surprising :) completely. The well-known userspace
tools (ifconfig, ip, etc) will not see the lo@1, they'll see lo.
Userspace in a container can either umount /sys completely, or do

mount -t tmpfs none /sys/class/net
mount --bind /sys/devices/virtual/net/lo@1 /sys/class/net/lo

if they really want to, in which case only their view
of /sys/devices/virtual/net would be different.

Eric, would you hate this less if it was under some

CONFIG_SYSFS_NETNS_HACK

config variable?

-serge

2008-10-22 20:34:44

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] netns: Coexist with the sysfs limitations


To make testing of the network namespace simpler allow
the network namespace code and the sysfs code to be
compiled and run at the same time. To do this only
virtual devices are allowed in the additional network
namespaces and those virtual devices are not placed
in the kobject tree.

Since virtual devices don't actually do anything interesting
hardware wise that needs device management there should
be no loss in keeping them out of the kobject tree and
by implication sysfs. The gain in ease of testing
and code coverage should be significant.

I.e. people running distributions that make it next to
impossible to boot without sysfs should at be able to
boot a test kernel now.

Plus no ABIs are harmed with this patch.

Signed-off-by: Eric W. Biederman <[email protected]>
---
net/Kconfig | 2 +-
net/core/dev.c | 12 +++++++++++-
net/core/net-sysfs.c | 7 +++++++
3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index d789d79..8c3d97c 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -27,7 +27,7 @@ menu "Networking options"
config NET_NS
bool "Network namespace support"
default n
- depends on EXPERIMENTAL && !SYSFS && NAMESPACES
+ depends on EXPERIMENTAL && NAMESPACES
help
Allow user space to create what appear to be multiple instances
of the network stack.
diff --git a/net/core/dev.c b/net/core/dev.c
index b8a4fd0..a7f0461 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4449,6 +4449,15 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
if (dev->features & NETIF_F_NETNS_LOCAL)
goto out;

+#ifdef CONFIG_SYSFS
+ /* Don't allow real devices to be moved when sysfs
+ * is enabled.
+ */
+ err = -EINVAL;
+ if (dev->dev.parent)
+ goto out;
+#endif
+
/* Ensure the device has been registrered */
err = -EINVAL;
if (dev->reg_state != NETREG_REGISTERED)
@@ -4506,6 +4515,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
*/
dev_addr_discard(dev);

+ netdev_unregister_kobject(dev);
+
/* Actually switch the network namespace */
dev_net_set(dev, net);

@@ -4522,7 +4533,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
}

/* Fixup kobjects */
- netdev_unregister_kobject(dev);
err = netdev_register_kobject(dev);
WARN_ON(err);

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 92d6b94..85cb8bd 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -476,6 +476,10 @@ void netdev_unregister_kobject(struct net_device * net)
struct device *dev = &(net->dev);

kobject_get(&dev->kobj);
+
+ if (dev_net(net) != &init_net)
+ return;
+
device_del(dev);
}

@@ -501,6 +505,9 @@ int netdev_register_kobject(struct net_device *net)
#endif
#endif /* CONFIG_SYSFS */

+ if (dev_net(net) != &init_net)
+ return 0;
+
return device_add(dev);
}

--
1.5.3.rc6.17.g1911

2008-10-22 20:40:34

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH] netns: Coexist with the sysfs limitations

Eric W. Biederman wrote:
> To make testing of the network namespace simpler allow
> the network namespace code and the sysfs code to be
> compiled and run at the same time. To do this only
> virtual devices are allowed in the additional network
> namespaces and those virtual devices are not placed
> in the kobject tree.
>
> Since virtual devices don't actually do anything interesting
> hardware wise that needs device management there should
> be no loss in keeping them out of the kobject tree and
> by implication sysfs. The gain in ease of testing
> and code coverage should be significant.
>
> I.e. people running distributions that make it next to
> impossible to boot without sysfs should at be able to
> boot a test kernel now.
>
> Plus no ABIs are harmed with this patch.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Acked-by: Daniel Lezcano <[email protected]>

> ---
> net/Kconfig | 2 +-
> net/core/dev.c | 12 +++++++++++-
> net/core/net-sysfs.c | 7 +++++++
> 3 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/net/Kconfig b/net/Kconfig
> index d789d79..8c3d97c 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -27,7 +27,7 @@ menu "Networking options"
> config NET_NS
> bool "Network namespace support"
> default n
> - depends on EXPERIMENTAL && !SYSFS && NAMESPACES
> + depends on EXPERIMENTAL && NAMESPACES
> help
> Allow user space to create what appear to be multiple instances
> of the network stack.
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b8a4fd0..a7f0461 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4449,6 +4449,15 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
> if (dev->features & NETIF_F_NETNS_LOCAL)
> goto out;
>
> +#ifdef CONFIG_SYSFS
> + /* Don't allow real devices to be moved when sysfs
> + * is enabled.
> + */
> + err = -EINVAL;
> + if (dev->dev.parent)
> + goto out;
> +#endif
> +
> /* Ensure the device has been registrered */
> err = -EINVAL;
> if (dev->reg_state != NETREG_REGISTERED)
> @@ -4506,6 +4515,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
> */
> dev_addr_discard(dev);
>
> + netdev_unregister_kobject(dev);
> +
> /* Actually switch the network namespace */
> dev_net_set(dev, net);
>
> @@ -4522,7 +4533,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
> }
>
> /* Fixup kobjects */
> - netdev_unregister_kobject(dev);
> err = netdev_register_kobject(dev);
> WARN_ON(err);
>
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 92d6b94..85cb8bd 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -476,6 +476,10 @@ void netdev_unregister_kobject(struct net_device * net)
> struct device *dev = &(net->dev);
>
> kobject_get(&dev->kobj);
> +
> + if (dev_net(net) != &init_net)
> + return;
> +
> device_del(dev);
> }
>
> @@ -501,6 +505,9 @@ int netdev_register_kobject(struct net_device *net)
> #endif
> #endif /* CONFIG_SYSFS */
>
> + if (dev_net(net) != &init_net)
> + return 0;
> +
> return device_add(dev);
> }
>


--






















































Sauf indication contraire ci-dessus:
Compagnie IBM France
Si?ge Social : Tour Descartes, 2, avenue Gambetta, La D?fense 5, 92400
Courbevoie
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 542.737.118 ?
SIREN/SIRET : 552 118 465 02430

2008-10-22 21:04:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Eric W. Biederman ([email protected]):
>> Benjamin Thery <[email protected]> writes:
>>
>> > Support for network namespaces in mainline is pretty complete for
>> > some time now, but there is still this issue with sysfs that prevents
>> > more people to use it easily.
>>
>> Ben your patchset is completely inappropriate.
>>
>> Temporarily adding elements to the ABI that we intend to remove
>> is not a proper solution to this problem.
>>
>> That user space visible ida you add is a namespace identifier that breaks
>> nested containers and migration. It is very very very wrong.
>
> I disagree (not surprising :) completely. The well-known userspace
> tools (ifconfig, ip, etc) will not see the lo@1, they'll see lo.
> Userspace in a container can either umount /sys completely, or do

The well-known user space tools don't use /sys at all. Modern
network tools use rtnetlink (ip) old network tools use /proc/net.

Very few things actually use /sys and for those things lo@1 or
eth0@1 are completely useless except for implementing a FUSE
mock up of sysfs. But you don't need anything in sysfs to do
that as all of the interesting information is available through
/proc/net or rtnetlink.

>
> mount -t tmpfs none /sys/class/net
> mount --bind /sys/devices/virtual/net/lo@1 /sys/class/net/lo
>
> if they really want to, in which case only their view
> of /sys/devices/virtual/net would be different.
>
> Eric, would you hate this less if it was under some
>
> CONFIG_SYSFS_NETNS_HACK
>
> config variable?

No. ABI decisions are almost certainly irreversible.

If we need an immediate hack please see the patch I sent
in follow up. We can achieve everything Ben is doing by simply
keeping virtual devices out of the kobject tree. Keeping them
from showing up in sysfs.

Eric

2008-10-22 21:14:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries

Greg KH <[email protected]> writes:

> On Wed, Oct 22, 2008 at 05:21:44PM +0200, Benjamin Thery wrote:
>> Network devices from sub-network namespaces appear in sysfs
>> with a name that looks like this: device_name@netns_id
>> eg: lo@3, eth0@4e
>
> How does the default udev rules as shipped by most distros handle the
> renaming of the network device if the MAC address is duplicated like it
> will be for these eth devices?

The mac address is not duplicated.

Further devices like eth0@4e are completely unusable to the udev
rules in the initial network namespace because they can not talk
to or affect them.

As I read it Ben's ``solution'' puts entries in sysfs that are
completely unusable to udev.

Eric

2008-10-22 21:21:42

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [PATCH] netns: Coexist with the sysfs limitations

Quoting Eric W. Biederman ([email protected]):
>
> To make testing of the network namespace simpler allow
> the network namespace code and the sysfs code to be
> compiled and run at the same time. To do this only
> virtual devices are allowed in the additional network
> namespaces and those virtual devices are not placed
> in the kobject tree.
>
> Since virtual devices don't actually do anything interesting
> hardware wise that needs device management there should
> be no loss in keeping them out of the kobject tree and
> by implication sysfs. The gain in ease of testing
> and code coverage should be significant.
>
> I.e. people running distributions that make it next to
> impossible to boot without sysfs should at be able to
> boot a test kernel now.
>
> Plus no ABIs are harmed with this patch.
>
> Signed-off-by: Eric W. Biederman <[email protected]>

Duh.

Tested-by: Serge Hallyn <[email protected]>
Acked-by: Serge Hallyn <[email protected]>

Thanks, Eric! Thanks, Benjamin!

-serge

2008-10-22 21:28:12

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries

On Wed, Oct 22, 2008 at 02:08:26PM -0700, Eric W. Biederman wrote:
> Greg KH <[email protected]> writes:
>
> > On Wed, Oct 22, 2008 at 05:21:44PM +0200, Benjamin Thery wrote:
> >> Network devices from sub-network namespaces appear in sysfs
> >> with a name that looks like this: device_name@netns_id
> >> eg: lo@3, eth0@4e
> >
> > How does the default udev rules as shipped by most distros handle the
> > renaming of the network device if the MAC address is duplicated like it
> > will be for these eth devices?
>
> The mac address is not duplicated.

Ah, ok, I really don't think I want to know more :)

> Further devices like eth0@4e are completely unusable to the udev
> rules in the initial network namespace because they can not talk
> to or affect them.

Oh, good point.

> As I read it Ben's ``solution'' puts entries in sysfs that are
> completely unusable to udev.

That's not a good thing to do, if udev can't see them, than HAL can't
see them, then the rest of userspace usually has no idea they are
present either.

thanks,

greg k-h

2008-10-22 21:55:29

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries

On Wed, 22 Oct 2008 14:01:59 -0700
[email protected] (Eric W. Biederman) wrote:

> "Serge E. Hallyn" <[email protected]> writes:
>
> > Quoting Eric W. Biederman ([email protected]):
> >> Benjamin Thery <[email protected]> writes:
> >>
> >> > Support for network namespaces in mainline is pretty complete for
> >> > some time now, but there is still this issue with sysfs that prevents
> >> > more people to use it easily.
> >>
> >> Ben your patchset is completely inappropriate.
> >>
> >> Temporarily adding elements to the ABI that we intend to remove
> >> is not a proper solution to this problem.
> >>
> >> That user space visible ida you add is a namespace identifier that breaks
> >> nested containers and migration. It is very very very wrong.
> >
> > I disagree (not surprising :) completely. The well-known userspace
> > tools (ifconfig, ip, etc) will not see the lo@1, they'll see lo.
> > Userspace in a container can either umount /sys completely, or do
>
> The well-known user space tools don't use /sys at all. Modern
> network tools use rtnetlink (ip) old network tools use /proc/net.
>
> Very few things actually use /sys and for those things lo@1 or
> eth0@1 are completely useless except for implementing a FUSE
> mock up of sysfs. But you don't need anything in sysfs to do
> that as all of the interesting information is available through
> /proc/net or rtnetlink.

Lots of scripts are starting use /sys for things. It is easier to
parse /sys/class/net than the output of ifconfig or ip

2008-10-22 23:04:47

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries

Stephen Hemminger <[email protected]> writes:

>> The well-known user space tools don't use /sys at all. Modern
>> network tools use rtnetlink (ip) old network tools use /proc/net.
>>
>> Very few things actually use /sys and for those things lo@1 or
>> eth0@1 are completely useless except for implementing a FUSE
>> mock up of sysfs. But you don't need anything in sysfs to do
>> that as all of the interesting information is available through
>> /proc/net or rtnetlink.
>
> Lots of scripts are starting use /sys for things. It is easier to
> parse /sys/class/net than the output of ifconfig or ip

Yes. So we need the correct values in /sys/class/net.

Which is why sysfs and network namespaces do not coexist currently.

Which is why I recommend only placing devices in the initial network
namespace in sysfs for the short term.

Which is why we need to get all of the details correct when we
handle multiple network namespaces and sysfs.

I should have something working and should be sending patches out
shortly. Cleaning up sysfs and the device model enough so the changes
are not spooky is a long hard road unfortunately.

Eric

2008-10-23 04:14:34

by Kyle Moffett

[permalink] [raw]
Subject: Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries

On Wed, Oct 22, 2008 at 6:54 PM, Eric W. Biederman
<[email protected]> wrote:
> Stephen Hemminger <[email protected]> writes:
>>> The well-known user space tools don't use /sys at all. Modern
>>> network tools use rtnetlink (ip) old network tools use /proc/net.
>>>
>>> Very few things actually use /sys and for those things lo@1 or
>>> eth0@1 are completely useless except for implementing a FUSE
>>> mock up of sysfs. But you don't need anything in sysfs to do
>>> that as all of the interesting information is available through
>>> /proc/net or rtnetlink.
>>
>> Lots of scripts are starting use /sys for things. It is easier to
>> parse /sys/class/net than the output of ifconfig or ip
>
> Yes. So we need the correct values in /sys/class/net.
>
> Which is why sysfs and network namespaces do not coexist currently.

I definitely agree that "eth0@1" is a bad ide. I know for sure that a
relatively large number of system init scripts poke about in
/sys/class/net/$IFACE, as well as a number of the system installation
scripts. Those scripts include some that my company has written for
internal use and others supplied by distributions.

Daemons and such mostly use netlink, but for anything written in shell
it's much easier to do things like this:

for devpath in /sys/class/net/*; do
dev="${dev##/sys/class/net/}"
case "${dev}" in
[....]
esac
if [ "x${mac}" = "x$(cat ${devpath}/address)" ]; then
echo "Found MAC '${mac}': ${dev}"
fi
done

If I wanted to perform similar things with the output of ifconfig, it
would involve some much more fragile manual text parsing of the output
of the "ip" or "ifconfig" commands. And sometimes the "ifconfig"
command is outright wrong. If you have an interface name longer than
7 characters or so then some versions of ifconfig will truncate it
internally and display garbage.

While the "show only the root namespace" interim solution is
problematic for processes in network namespaces, I think it's more
important not to break things for admin tools in the root namespace.

Cheers,
Kyle Moffett

2008-10-23 09:03:18

by Benjamin Thery

[permalink] [raw]
Subject: Re: [PATCH] netns: Coexist with the sysfs limitations

Serge E. Hallyn wrote:
> Quoting Eric W. Biederman ([email protected]):
>> To make testing of the network namespace simpler allow
>> the network namespace code and the sysfs code to be
>> compiled and run at the same time. To do this only
>> virtual devices are allowed in the additional network
>> namespaces and those virtual devices are not placed
>> in the kobject tree.
>>
>> Since virtual devices don't actually do anything interesting
>> hardware wise that needs device management there should
>> be no loss in keeping them out of the kobject tree and
>> by implication sysfs. The gain in ease of testing
>> and code coverage should be significant.
>>
>> I.e. people running distributions that make it next to
>> impossible to boot without sysfs should at be able to
>> boot a test kernel now.
>>
>> Plus no ABIs are harmed with this patch.

>> Signed-off-by: Eric W. Biederman <[email protected]>
>
> Duh.
>
> Tested-by: Serge Hallyn <[email protected]>
> Acked-by: Serge Hallyn <[email protected]>

Oh, this patch is short, clean, and the limitation introduced isn't too
annoying for testing netns right now.

At least, my proposal provoked some reactions :)

BTW, there's a second limitation with your patch:
we can't rename the net devices in the additional network namespaces.

In net/core/dev.c, dev_change_name() fails: call to device_rename()
return an (expected) -EINVAL error.
Maybe we should add a test on the net to only call it in init_net?

rollback:
- err = device_rename(&dev->dev, dev->name);
- if (err) {
- memcpy(dev->name, oldname, IFNAMSIZ);
- return err;
+ if (net == &init_net) {
+ err = device_rename(&dev->dev, dev->name);
+ if (err) {
+ memcpy(dev->name, oldname, IFNAMSIZ);
+ return err;
+ }
}

Otherwise,

Acked-by: Benjamin Thery <[email protected]>

Thanks,
Benjamin

>
> Thanks, Eric! Thanks, Benjamin!
>
> -serge
>
>


--
B e n j a m i n T h e r y - BULL/DT/Open Software R&D

http://www.bull.com

2008-10-23 12:00:45

by Benjamin Thery

[permalink] [raw]
Subject: Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries

Eric W. Biederman wrote:
> Benjamin Thery <[email protected]> writes:
>
>> Support for network namespaces in mainline is pretty complete for
>> some time now, but there is still this issue with sysfs that prevents
>> more people to use it easily.
>
> Ben your patchset is completely inappropriate.
>
> Temporarily adding elements to the ABI that we intend to remove
> is not a proper solution to this problem.

In fact, I feared that would be an issue before I sent the patches.
Thanks for confirming this.

But, as this alternative was discussed a bit after Al's comments on your
last patches, I had to give it a try and code a proof-of-concept.

Benjamin

>
> That user space visible ida you add is a namespace identifier that breaks
> nested containers and migration. It is very very very wrong.
>
> Eric
>
>
>


--
B e n j a m i n T h e r y - BULL/DT/Open Software R&D

http://www.bull.com

2008-10-23 15:44:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] netns: Coexist with the sysfs limitations

Benjamin Thery <[email protected]> writes:

> Serge E. Hallyn wrote:
>> Quoting Eric W. Biederman ([email protected]):
>>> To make testing of the network namespace simpler allow
>>> the network namespace code and the sysfs code to be
>>> compiled and run at the same time. To do this only
>>> virtual devices are allowed in the additional network
>>> namespaces and those virtual devices are not placed
>>> in the kobject tree.
>>>
>>> Since virtual devices don't actually do anything interesting
>>> hardware wise that needs device management there should
>>> be no loss in keeping them out of the kobject tree and
>>> by implication sysfs. The gain in ease of testing
>>> and code coverage should be significant.
>>>
>>> I.e. people running distributions that make it next to
>>> impossible to boot without sysfs should at be able to
>>> boot a test kernel now.
>>>
>>> Plus no ABIs are harmed with this patch.
>
>>> Signed-off-by: Eric W. Biederman <[email protected]>
>>
>> Duh.
>>
>> Tested-by: Serge Hallyn <[email protected]>
>> Acked-by: Serge Hallyn <[email protected]>
>
> Oh, this patch is short, clean, and the limitation introduced isn't too
> annoying for testing netns right now.
>
> At least, my proposal provoked some reactions :)

Yes.

> BTW, there's a second limitation with your patch:
> we can't rename the net devices in the additional network namespaces.
>
> In net/core/dev.c, dev_change_name() fails: call to device_rename() return an
> (expected) -EINVAL error.
> Maybe we should add a test on the net to only call it in init_net?

Yes. Good catch.

Eric

2008-10-23 15:54:43

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries

Benjamin Thery <[email protected]> writes:

> In fact, I feared that would be an issue before I sent the patches.
> Thanks for confirming this.
>
> But, as this alternative was discussed a bit after Al's comments on your
> last patches, I had to give it a try and code a proof-of-concept.

Yes. We have a lot of reasons now for why that approach is a
bad idea. Which should help a lot when it comes to handling sysfs
properly.

Eric

2008-10-23 16:05:07

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] netns: Coexist with the sysfs limitations v2


To make testing of the network namespace simpler allow
the network namespace code and the sysfs code to be
compiled and run at the same time. To do this only
virtual devices are allowed in the additional network
namespaces and those virtual devices are not placed
in the kobject tree.

Since virtual devices don't actually do anything interesting
hardware wise that needs device management there should
be no loss in keeping them out of the kobject tree and
by implication sysfs. The gain in ease of testing
and code coverage should be significant.

Changelog:

v2: As pointed out by Benjamin Thery it only makes sense to call
device_rename in the initial network namespace for now.

Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: Benjamin Thery <[email protected]>
Tested-by: Serge Hallyn <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
---
net/Kconfig | 2 +-
net/core/dev.c | 25 ++++++++++++++++++++-----
net/core/net-sysfs.c | 7 +++++++
3 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index d789d79..8c3d97c 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -27,7 +27,7 @@ menu "Networking options"
config NET_NS
bool "Network namespace support"
default n
- depends on EXPERIMENTAL && !SYSFS && NAMESPACES
+ depends on EXPERIMENTAL && NAMESPACES
help
Allow user space to create what appear to be multiple instances
of the network stack.
diff --git a/net/core/dev.c b/net/core/dev.c
index b8a4fd0..0e38594 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -924,10 +924,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
strlcpy(dev->name, newname, IFNAMSIZ);

rollback:
- ret = device_rename(&dev->dev, dev->name);
- if (ret) {
- memcpy(dev->name, oldname, IFNAMSIZ);
- return ret;
+ /* For now only devices in the initial network namespace
+ * are in sysfs.
+ */
+ if (net == &init_net) {
+ ret = device_rename(&dev->dev, dev->name);
+ if (ret) {
+ memcpy(dev->name, oldname, IFNAMSIZ);
+ return ret;
+ }
}

write_lock_bh(&dev_base_lock);
@@ -4449,6 +4454,15 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
if (dev->features & NETIF_F_NETNS_LOCAL)
goto out;

+#ifdef CONFIG_SYSFS
+ /* Don't allow real devices to be moved when sysfs
+ * is enabled.
+ */
+ err = -EINVAL;
+ if (dev->dev.parent)
+ goto out;
+#endif
+
/* Ensure the device has been registrered */
err = -EINVAL;
if (dev->reg_state != NETREG_REGISTERED)
@@ -4506,6 +4520,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
*/
dev_addr_discard(dev);

+ netdev_unregister_kobject(dev);
+
/* Actually switch the network namespace */
dev_net_set(dev, net);

@@ -4522,7 +4538,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
}

/* Fixup kobjects */
- netdev_unregister_kobject(dev);
err = netdev_register_kobject(dev);
WARN_ON(err);

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 92d6b94..85cb8bd 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -476,6 +476,10 @@ void netdev_unregister_kobject(struct net_device * net)
struct device *dev = &(net->dev);

kobject_get(&dev->kobj);
+
+ if (dev_net(net) != &init_net)
+ return;
+
device_del(dev);
}

@@ -501,6 +505,9 @@ int netdev_register_kobject(struct net_device *net)
#endif
#endif /* CONFIG_SYSFS */

+ if (dev_net(net) != &init_net)
+ return 0;
+
return device_add(dev);
}

--
1.5.3.rc6.17.g1911

2008-10-27 19:42:04

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] netns: Coexist with the sysfs limitations v2

From: [email protected] (Eric W. Biederman)
Date: Thu, 23 Oct 2008 08:56:08 -0700

> To make testing of the network namespace simpler allow
> the network namespace code and the sysfs code to be
> compiled and run at the same time. To do this only
> virtual devices are allowed in the additional network
> namespaces and those virtual devices are not placed
> in the kobject tree.
>
> Since virtual devices don't actually do anything interesting
> hardware wise that needs device management there should
> be no loss in keeping them out of the kobject tree and
> by implication sysfs. The gain in ease of testing
> and code coverage should be significant.
>
> Changelog:
>
> v2: As pointed out by Benjamin Thery it only makes sense to call
> device_rename in the initial network namespace for now.
>
> Signed-off-by: Eric W. Biederman <[email protected]>
> Acked-by: Benjamin Thery <[email protected]>
> Tested-by: Serge Hallyn <[email protected]>
> Acked-by: Serge Hallyn <[email protected]>
> Acked-by: Daniel Lezcano <[email protected]>

So let's figure out what happens with this patch.
I'm personally ok with the change, the question is when
and where.

My net-2.6 tree was closed to new features long ago, so I really
don't want to try to merge this sucker into 2.6.28-rcX :-) But if
you guys think that is prudent, feel free to submit it directly to
Linus and add my signoff:

Signed-off-by: David S. Miller <[email protected]>

otherwise if we shoot for 2.6.29 I would suggest that we wait until
the merge window to see if the sysfs issues get sorted, and if not
we slip this patch into to tree instead.

Let me know what you guys plan to do with this.

2008-10-27 20:25:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] netns: Coexist with the sysfs limitations v2

David Miller <[email protected]> writes:

> From: [email protected] (Eric W. Biederman)
> Date: Thu, 23 Oct 2008 08:56:08 -0700
>
>> To make testing of the network namespace simpler allow
>> the network namespace code and the sysfs code to be
>> compiled and run at the same time. To do this only
>> virtual devices are allowed in the additional network
>> namespaces and those virtual devices are not placed
>> in the kobject tree.
>>
>> Since virtual devices don't actually do anything interesting
>> hardware wise that needs device management there should
>> be no loss in keeping them out of the kobject tree and
>> by implication sysfs. The gain in ease of testing
>> and code coverage should be significant.
>>
>> Changelog:
>>
>> v2: As pointed out by Benjamin Thery it only makes sense to call
>> device_rename in the initial network namespace for now.
>>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>> Acked-by: Benjamin Thery <[email protected]>
>> Tested-by: Serge Hallyn <[email protected]>
>> Acked-by: Serge Hallyn <[email protected]>
>> Acked-by: Daniel Lezcano <[email protected]>
>
> So let's figure out what happens with this patch.
> I'm personally ok with the change, the question is when
> and where.
>
> My net-2.6 tree was closed to new features long ago, so I really
> don't want to try to merge this sucker into 2.6.28-rcX :-) But if
> you guys think that is prudent, feel free to submit it directly to
> Linus and add my signoff:
>
> Signed-off-by: David S. Miller <[email protected]>
>
> otherwise if we shoot for 2.6.29 I would suggest that we wait until
> the merge window to see if the sysfs issues get sorted, and if not
> we slip this patch into to tree instead.
>
> Let me know what you guys plan to do with this.

What I was thinking is that it goes into your tree for 2.6.29. Allowing
for better test coverage in the short term, and removing the pressure
to do a hack job on sysfs just to reduce the pain of testing.

The patch was sent during the merge window just because that is
when the conversation was happening.

I guess the pain with sysfs is having multiple patches in different
trees in this area causing conflicts in linux-next. Ugh. I can see
why you would want to hold back. On the contrary point of view we
need that patch in someones tree or else we might as well merge it
now, if the plan is to merge it without it sitting in anyone's
development tree.

So my plan is I'm not going to worry about that patch, and leave it to
Ben and Daniel (if it needs a retransmit). If it happens to merge
into net-next and that causes conflicts when we do a good job on
sysfs I will handle it.

Eric

2008-10-28 00:51:29

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] netns: Coexist with the sysfs limitations v2

From: [email protected] (Eric W. Biederman)
Date: Mon, 27 Oct 2008 13:19:27 -0700

> What I was thinking is that it goes into your tree for 2.6.29. Allowing
> for better test coverage in the short term, and removing the pressure
> to do a hack job on sysfs just to reduce the pain of testing.

Fair enough. I'll add it to my tree.