Return-Path: Received: from out02.mta.xmission.com ([166.70.13.232]:51798 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755794AbZEMACE (ORCPT ); Tue, 12 May 2009 20:02:04 -0400 To: Matt Helsley Cc: Containers , linux-nfs@vger.kernel.org Subject: Re: [RFC][PATCH] Improve NFS use of network and mount namespaces References: <20090512215138.GD3912@us.ibm.com> From: ebiederm@xmission.com (Eric W. Biederman) Date: Tue, 12 May 2009 17:01:58 -0700 In-Reply-To: <20090512215138.GD3912@us.ibm.com> (Matt Helsley's message of "Tue\, 12 May 2009 14\:51\:38 -0700") Message-ID: Content-Type: text/plain; charset=us-ascii Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 Matt Helsley writes: > Sun RPC currently opens sockets from the initial network namespace making it > impossible to restrict which NFS servers a container may interact with. > > For example, the NFS server at 10.0.0.3 reachable from the initial namespace > will always be used even if an entirely different server with the address > 10.0.0.3 is reachable from a container's network namespace. Hence network > namespaces cannot be used to restrict the network access of a container as long > as the RPC code opens sockets using the initial network namespace. This is > in stark contrast to other protocols like HTTP where the sockets are created in > their proper namespaces because kernel threads are not used to open sockets for > client network IO. > > We may plausibly end up with namespaces created by: > I) The administrator may mount 10.0.0.3:/export_foo from init's > container, clone the mount namespace, and unmount from the original > mount namespace. > > II) The administrator may start a task which clones the mount namespace > before mounting 10.0.0.3:/export_foo. > > Proposed Solution: > > The network namespace of the task that did the mount best defines which server > the "administrator", whether in a container or not, expects to work with. > When the mount is done inside a container then that is the network namespace > to use. When the mount is done prior to creating the container then that's the > namespace that should be used. > > This allows system administrators to isolate network traffic generated by NFS > clients by mounting after creating a container. If partial isolation is desired > then the administrator may mount before creating a container with a new network > namespace. In each case the RPC packets would originate from a consistent > namespace. > > One way to ensure consistent namespace usage would be to hold a reference to > the original network namespace as long as the mount exists. This naturally > suggests storing the network namespace reference in the NFS superblock. > However, it may be better to store it with the RPC transport itself since > it is directly responsible for (re)opening the sockets. > > This patch adds a reference to the network namespace to the RPC > transport. When the NFS export is mounted the network namespace of > the current task establishes which namespace to reference. That > reference is stored in the RPC transport and used to open sockets > whenever a new socket is required. Matt. This may be the basis of something and the problem is real. However it is clear you have missed a lot of details. So could you first address this problem in nfs_get_sb by denying the mount if we are not in the initial network namespace. I.e. if (current->nsproxy->net_ns != &init_net) return -EINVAL; That should be a lot simpler to get right and at least give reliable and predictable semantics. Eric