To: Matt Helsley <matthltc@us.ibm.com>
Cc: Containers <containers@lists.osdl.org>, linux-nfs@vger.kernel.org
Subject: Re: [RFC][PATCH] Improve NFS use of network and mount namespaces
References: <20090512215138.GD3912@us.ibm.com>
From: ebiederm@xmission.com (Eric W. Biederman)
Date: Tue, 12 May 2009 17:01:58 -0700
In-Reply-To: <20090512215138.GD3912@us.ibm.com> (Matt Helsley's message of "Tue\, 12 May 2009 14\:51\:38 -0700")
Message-ID: <m1fxf97tvt.fsf@fess.ebiederm.org>
Content-Type: text/plain; charset=us-ascii
Sender: linux-nfs-owner@vger.kernel.org
MIME-Version: 1.0

Matt Helsley <matthltc@us.ibm.com> writes:

> Sun RPC currently opens sockets from the initial network namespace making it
> impossible to restrict which NFS servers a container may interact with.
>
> For example, the NFS server at 10.0.0.3 reachable from the initial namespace
> will always be used even if an entirely different server with the address
> 10.0.0.3 is reachable from a container's network namespace. Hence network
> namespaces cannot be used to restrict the network access of a container as long
> as the RPC code opens sockets using the initial network namespace. This is
> in stark contrast to other protocols like HTTP where the sockets are created in
> their proper namespaces because kernel threads are not used to open sockets for
> client network IO.
>
> We may plausibly end up with namespaces created by:
> I) The administrator may mount 10.0.0.3:/export_foo from init's
> container, clone the mount namespace, and unmount from the original
> mount namespace.
>
> II) The administrator may start a task which clones the mount namespace
> before mounting 10.0.0.3:/export_foo.
>
> Proposed Solution:
>
> The network namespace of the task that did the mount best defines which server
> the "administrator", whether in a container or not, expects to work with.
> When the mount is done inside a container then that is the network namespace 
> to use. When the mount is done prior to creating the container then that's the 
> namespace that should be used.
>
> This allows system administrators to isolate network traffic generated by NFS
> clients by mounting after creating a container. If partial isolation is desired
> then the administrator may mount before creating a container with a new network
> namespace. In each case the RPC packets would originate from a consistent
> namespace.
>
> One way to ensure consistent namespace usage would be to hold a reference to
> the original network namespace as long as the mount exists. This naturally 
> suggests storing the network namespace reference in the NFS superblock. 
> However, it may be better to store it with the RPC transport itself since
> it is directly responsible for (re)opening the sockets.
>
> This patch adds a reference to the network namespace to the RPC
> transport. When the NFS export is mounted the network namespace of
> the current task establishes which namespace to reference. That
> reference is stored in the RPC transport and used to open sockets
> whenever a new socket is required.

Matt.  This may be the basis of something and the problem is real.
However it is clear you have missed a lot of details.

So could you first address this problem in nfs_get_sb by 
denying the mount if we are not in the initial network namespace.

I.e.

if (current->nsproxy->net_ns != &init_net)
	return -EINVAL;

That should be a lot simpler to get right and at least give reliable
and predictable semantics.


Eric