2008-11-19 21:50:04

by Michael Kerrisk

[permalink] [raw]
Subject: Current state of Network Namespaces (NETNS, CLONE_NEWNET)?

Sorry for the shotgun mail, but in the end, it's
not clear who can best answer my question(s).

I'm currently trying to add documentation of all of
the undocumented CLONE_* flags. One of these is
CLONE_NEWNET, and I could use (quite a lot of) help.

My questions:

What is the current state of the network namespace
implementation? Is it complete?

What objects are considered part of the network
namespace, and therefore distinct for a new network
namespace?

Is there any documentation for network namespaces
already?

Are there any test programs for network namespaces?

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html


2008-11-20 01:46:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Current state of Network Namespaces (NETNS, CLONE_NEWNET)?

Michael Kerrisk <[email protected]> writes:

> Sorry for the shotgun mail, but in the end, it's
> not clear who can best answer my question(s).
>
> I'm currently trying to add documentation of all of
> the undocumented CLONE_* flags. One of these is
> CLONE_NEWNET, and I could use (quite a lot of) help.
>
> My questions:
>
> What is the current state of the network namespace
> implementation? Is it complete?

No. It is fairly close though and there is general agreement
on what it is.

ipv4 and ipv6 are mostly complete and useable.
ip tables support is in progress.
sysfs support is in progress.

decnet and other protocols are possible but there is not currently
any active work in that direction.

> What objects are considered part of the network
> namespace, and therefore distinct for a new network
> namespace?

A network namespace is to user space a new logical
instance of the kernel networking stack.

The full kernel networking stack is available in the
initial network namespace. A subset of the kernel
networking stack is available in other network namespaces
depending upon how much code has been converted.

Network devices live in exactly one network namespace.

> Is there any documentation for network namespaces
> already?

Not much. Nor should it need much unique documentation.

Currently the truly unique command is:
ip link set <netdev> netns <pid>

Which moves a network device from one network namespace to another.

There are the veth pair network devices.
Designed so you can put one end in one network namespace and another
end in another network namespace.

There is the macvlan driver that can be sued to create multiple mac addresses
for your ethernet devices allowing native speed inside of a network namespace
on a machine with only one NIC.

There is the fact that /proc/net is now network namespace unique
There is the interesting games we play with /proc/sys/ so we have per network
namespace sysctls.

Other unique network namespace work under discussion.
- Unix domain sockets across network namespaces.
Is doable but we haven't considered all of the technical details.
- The ongoing discussion about how we provide a more managable interface
to network namespaces for people doing the whole linux-vrf thing.

Eric

2008-11-20 02:46:50

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: Current state of Network Namespaces (NETNS, CLONE_NEWNET)?

On Wed, Nov 19, 2008 at 05:37:35PM -0800, Eric W. Biederman wrote:
> Michael Kerrisk <[email protected]> writes:
>
> > Sorry for the shotgun mail, but in the end, it's
> > not clear who can best answer my question(s).
> >
> > I'm currently trying to add documentation of all of
> > the undocumented CLONE_* flags. One of these is
> > CLONE_NEWNET, and I could use (quite a lot of) help.
> >
> > My questions:
> >
> > What is the current state of the network namespace
> > implementation? Is it complete?
>
> No. It is fairly close though and there is general agreement
> on what it is.
>
> ipv4 and ipv6 are mostly complete and useable.
> ip tables support is in progress.

iptables will be mostly complete in 2.6.28.

2008-11-20 07:54:54

by Daniel Lezcano

[permalink] [raw]
Subject: Re: Current state of Network Namespaces (NETNS, CLONE_NEWNET)?

Michael Kerrisk wrote:
> Sorry for the shotgun mail, but in the end, it's
> not clear who can best answer my question(s).
>
> I'm currently trying to add documentation of all of
> the undocumented CLONE_* flags. One of these is
> CLONE_NEWNET, and I could use (quite a lot of) help.
>
> My questions:
>
> What is the current state of the network namespace
> implementation? Is it complete?

It is not complete but mostly usable for ipv4 and ipv6.

There is a network namespace status I filled at:

http://lxc.sourceforge.net/network/status.php

It should be up-to-date.

> What objects are considered part of the network
> namespace, and therefore distinct for a new network
> namespace?

The network namespace brings isolation from the layer-2 to upper layers.

> Is there any documentation for network namespaces
> already?

http://lxc.sourceforge.net/network.php
http://lxc.sourceforge.net/doc/sigops/appcr.pdf

> Are there any test programs for network namespaces?

http://sourceforge.net/projects/lxc/

follow the README page. It is still in development but mainly usable -
any feedbacks are welcome :)

2008-11-20 08:05:05

by Subrata Modak

[permalink] [raw]
Subject: Re: Current state of Network Namespaces (NETNS, CLONE_NEWNET)?


On Thu, 2008-11-20 at 08:54 +0100, Daniel Lezcano wrote:
> Michael Kerrisk wrote:
> > Sorry for the shotgun mail, but in the end, it's
> > not clear who can best answer my question(s).
> >
> > I'm currently trying to add documentation of all of
> > the undocumented CLONE_* flags. One of these is
> > CLONE_NEWNET, and I could use (quite a lot of) help.
> >
> > My questions:
> >
> > What is the current state of the network namespace
> > implementation? Is it complete?
>
> It is not complete but mostly usable for ipv4 and ipv6.
>
> There is a network namespace status I filled at:
>
> http://lxc.sourceforge.net/network/status.php
>
> It should be up-to-date.
>
> > What objects are considered part of the network
> > namespace, and therefore distinct for a new network
> > namespace?
>
> The network namespace brings isolation from the layer-2 to upper layers.
>
> > Is there any documentation for network namespaces
> > already?
>
> http://lxc.sourceforge.net/network.php
> http://lxc.sourceforge.net/doc/sigops/appcr.pdf
>
> > Are there any test programs for network namespaces?
>
> http://sourceforge.net/projects/lxc/

And also at:

http://ltp.cvs.sourceforge.net/viewvc/ltp/ltp/testcases/kernel/containers/

Regards--
Subrata

>
> follow the README page. It is still in development but mainly usable -
> any feedbacks are welcome :)
>

2008-11-20 18:20:35

by Michael Kerrisk

[permalink] [raw]
Subject: CLONE_NEWNET documentation

Based on my reading of some of the kernel source, various
documentation that I've now read, and comments I received
from people to my earlier mail ("Current state of Network
Namespaces (NETNS, CLONE_NEWNET)?"), I've written the patch
below to document the CLONE_NEWNET clone(2) flag.
Fixes and suggestions for improvements welcome.

Cheers,

Michael

CLONE_NEWNET (since Linux 2.6.24)
(The implementation of this flag is not yet com-
plete, but probably will be mostly complete by
about Linux 2.6.28.)

If CLONE_NEWNET is set, then create the process in
a new network namespace. If this flag is not set,
then (as with fork(2)), the process is created in
the same network namespace as the calling process.
This flag is intended for the implementation of
containers.

A network namespace provides an isolated view of
the networking stack (network device interfaces,
IPv4 and IPv6 protocol stacks, IP routing tables,
firewall rules, the /proc/net and /sys/class/net
directory trees, sockets, etc.). A physical net-
work device can live in exactly one network names-
pace. A virtual network device ("veth") pair pro-
vides a pipe-like abstraction that can be used to
create tunnels between network namespaces, and can
be used to create a bridge to a physical network
device in another namespace.

Use of this flag requires: a kernel configured
with the CONFIG_NET_NS option and that the process
be privileged (CAP_SYS_ADMIN).

--- a/man2/clone.2
+++ b/man2/clone.2
@@ -286,10 +285,41 @@ and
configuration options and that the process be privileged
.RB ( CAP_SYS_ADMIN ).
This flag can't be specified in conjunction with
.BR CLONE_SYSVSEM .
.TP
+.BR CLONE_NEWNET " (since Linux 2.6.24)"
+(The implementation of this flag is not yet complete,
+but probably will be mostly complete by about Linux 2.6.28.)
+
+If
+.B CLONE_NEWNET
+is set, then create the process in a new network namespace.
+If this flag is not set, then (as with
+.BR fork (2)),
+the process is created in the same network namespace as
+the calling process.
+This flag is intended for the implementation of containers.
+
+A network namespace provides an isolated view of the networking stack
+(network device interfaces, IPv4 and IPv6 protocol stacks,
+IP routing tables, firewall rules, the
+.I /proc/net
+and
+.I /sys/class/net
+directory trees, sockets, etc.).
+A physical network device can live in exactly one
+network namespace.
+A virtual network device ("veth") pair provides a pipe-like abstraction
+that can be used to create tunnels between network namespaces,
+and can be used to create a bridge to a physical network device
+in another namespace.
+
+Use of this flag requires: a kernel configured with the
+.B CONFIG_NET_NS
+option and that the process be privileged
+.RB ( CAP_SYS_ADMIN ).
+.TP
.BR CLONE_NEWNS " (since Linux 2.4.19)"
Start the child in a new mount namespace.

Every process lives in a mount namespace.
The
@@ -822,10 +852,18 @@ but the kernel was not configured with the
and
.BR CONFIG_IPC_NS
options.
.TP
.B EINVAL
+.BR CLONE_NEWNET
+was specified in
+.IR flags ,
+but the kernel was not configured with the
+.B CONFIG_NET_NS
+option.
+.TP
+.B EINVAL
.BR CLONE_NEWPID
was specified in
.IR flags ,
but the kernel was not configured with the
.B CONFIG_PID_NS
@@ -844,10 +882,11 @@ Cannot allocate sufficient memory to allocate a task structure for the
child, or to copy those parts of the caller's context that need to be
copied.
.TP
.B EPERM
.BR CLONE_NEWIPC ,
+.BR CLONE_NEWNET ,
.BR CLONE_NEWNS ,
.BR CLONE_NEWPID ,
or
.BR CLONE_NEWUTS
was specified by a non-root process (process without \fBCAP_SYS_ADMIN\fP).