We can often specify the UTS namespace to use when starting an RPC client.
However sometimes no UTS namespace is available (specifically during system
shutdown as the last NFS mount in a container is unmounted) so fall
back to the initial UTS namespace.
Signed-off-by: Matt Helsley <[email protected]>
Cc: Cedric Le Goater <clg-NmTC/[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: [email protected]
Cc: Trond Myklebust <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Linux Containers <[email protected]>
---
net/sunrpc/clnt.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Index: linux-2.6.28/net/sunrpc/clnt.c
===================================================================
--- linux-2.6.28.orig/net/sunrpc/clnt.c
+++ linux-2.6.28/net/sunrpc/clnt.c
@@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
struct rpc_version *version;
struct rpc_clnt *clnt = NULL;
struct rpc_auth *auth;
+ struct new_utsname *uts_ns = init_utsname();
int err;
size_t len;
@@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
}
/* save the nodename */
- clnt->cl_nodelen = strlen(init_utsname()->nodename);
+ if (current->nsproxy != NULL)
+ uts_ns = utsname();
+ clnt->cl_nodelen = strlen(uts_ns->nodename);
if (clnt->cl_nodelen > UNX_MAXNODENAME)
clnt->cl_nodelen = UNX_MAXNODENAME;
- memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
+ memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
rpc_register_client(clnt);
return clnt;
--
Quoting Chuck Lever ([email protected]):
> On Jan 6, 2009, at Jan 6, 2009, 3:02 PM, Serge E. Hallyn wrote:
>> Quoting Matt Helsley ([email protected]):
>>> We can often specify the UTS namespace to use when starting an RPC
>>> client.
>>> However sometimes no UTS namespace is available (specifically during
>>> system
>>> shutdown as the last NFS mount in a container is unmounted) so fall
>>> back to the initial UTS namespace.
>>
>> So what happens if we take this patch and do nothing else?
>
> I thought the point of this was to prevent incorrect container nodenames
> from leaking onto the network.
But define incorrect. If container B does an nfs mount, container c
is launched with a tree in that mount, container B dies, and container C
umounts it. Should the umount belong to container B (for having
mounted it), container C (for having umount it), or the init_utsname
(for being the physical host/kernel)?
I get the feeling that consensus on this thread is that init_utsname
is actually the best choice, but OTOH if I have 3 containers on my
host, for apache, mysql, and postfix servers, and each is doing
nfs mounts from a physically remote machine, maybe I care about
having them report separate nodenames?
(that's a question, I really don't know...)
-serge
On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote:
> > On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> > > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > > > That implies to me you want to capture the value at mount time, and to
> > > > > pass it in to the rpc_call creation, and only at very specific well
> > > > > defined points where we interact with user space should we examine
> > > > > current->utsname(). At which point there should be no question
> > > > > of current->utsname() is valid as the user space process is alive.
> > > >
> > > > Why pretend that the filesystem is owned by a particular namespace? It
> > > > can, and will be shared among many containers...
> > >
> > > If the only purpose of this is to fill in the auth_unix cred then
> > > shouldn't it be part of whatever cred structures are passed around?
> >
> > So how does tracking it in a shared structure like the rpc_client help?
> > If you consider it to be part of the cred, then it needs to be tracked
> > in the cred...
>
> Right, that's what I meant.
>
> It seems like overkill, though. Does anyone actually care whether these
> names are right?
That's certainly a tempting angle. However we may not "control" the
server code -- couldn't there be some oddball (maybe even proprietary)
NFS servers out there that users do care about interacting with?
Cheers,
-Matt Helsley
On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote:
> It seems like overkill, though. Does anyone actually care whether these
> names are right?
I very much doubt it. Particularly since the actual sockets used by the
RPC client are always owned by the init namespace (they are established
from a workqueue process), and so the source IP address will always be
that of the init namespace.
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
Matt Helsley <[email protected]> writes:
> On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote:
>> On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote:
>> > On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
>> > > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
>> > > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
>> > > > > That implies to me you want to capture the value at mount time, and to
>> > > > > pass it in to the rpc_call creation, and only at very specific well
>> > > > > defined points where we interact with user space should we examine
>> > > > > current->utsname(). At which point there should be no question
>> > > > > of current->utsname() is valid as the user space process is alive.
>> > > >
>> > > > Why pretend that the filesystem is owned by a particular namespace? It
>> > > > can, and will be shared among many containers...
Sounds right. Still like the owner of a file it can happen that some
containers are more correct than others. Especially in the context of
mount merging and the other sophisticated caching that happens in NFS
this increasingly sounds like something that belongs in the cred as
that is where it is used.
>> > > If the only purpose of this is to fill in the auth_unix cred then
>> > > shouldn't it be part of whatever cred structures are passed around?
>> >
>> > So how does tracking it in a shared structure like the rpc_client help?
>> > If you consider it to be part of the cred, then it needs to be tracked
>> > in the cred...
>>
>> Right, that's what I meant.
>>
>> It seems like overkill, though. Does anyone actually care whether these
>> names are right?
>
> That's certainly a tempting angle. However we may not "control" the
> server code -- couldn't there be some oddball (maybe even proprietary)
> NFS servers out there that users do care about interacting with?
Matt could you look at what it will take to do the right thing from
the network namespace side of things as well? I believe it is going
to require the same level of understanding of the interactions in the code
to get there.
For the network namespace we should cache it at mount or server
startup and use it until we are done. In a network namespace context
there are good reasons for that because talking to 10.0.0.1 on one
network may not be the same machine as talking to 10.0.0.1 on another
network. NFS reestablishes tcp connections if the connection to the
server breaks doesn't it? Or is that left to user space?
Eric
Quoting Matt Helsley ([email protected]):
> We can often specify the UTS namespace to use when starting an RPC client.
> However sometimes no UTS namespace is available (specifically during system
> shutdown as the last NFS mount in a container is unmounted) so fall
> back to the initial UTS namespace.
So what happens if we take this patch and do nothing else?
The only potential problem situation will be rpc requests
made on behalf of a container in which the last task has
exited, right? So let's say a container did an nfs mount
and then exits, causing an nfs umount request.
That umount request will now be sent with the wrong nodename.
Does that actually cause problems, will the server use the
nodename to try and determine the client sending the request?
thanks,
-serge
> Signed-off-by: Matt Helsley <[email protected]>
> Cc: Cedric Le Goater <clg-NmTC/[email protected]>
> Cc: Linux Kernel Mailing List <[email protected]>
> Cc: [email protected]
> Cc: Trond Myklebust <[email protected]>
> Cc: Chuck Lever <[email protected]>
> Cc: Eric W. Biederman <[email protected]>
> Cc: Linux Containers <[email protected]>
>
> ---
> net/sunrpc/clnt.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> Index: linux-2.6.28/net/sunrpc/clnt.c
> ===================================================================
> --- linux-2.6.28.orig/net/sunrpc/clnt.c
> +++ linux-2.6.28/net/sunrpc/clnt.c
> @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
> struct rpc_version *version;
> struct rpc_clnt *clnt = NULL;
> struct rpc_auth *auth;
> + struct new_utsname *uts_ns = init_utsname();
> int err;
> size_t len;
>
> @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
> }
>
> /* save the nodename */
> - clnt->cl_nodelen = strlen(init_utsname()->nodename);
> + if (current->nsproxy != NULL)
> + uts_ns = utsname();
> + clnt->cl_nodelen = strlen(uts_ns->nodename);
> if (clnt->cl_nodelen > UNX_MAXNODENAME)
> clnt->cl_nodelen = UNX_MAXNODENAME;
> - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
> + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
> rpc_register_client(clnt);
> return clnt;
>
>
> --
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers
On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> Quoting Matt Helsley ([email protected]):
> > We can often specify the UTS namespace to use when starting an RPC client.
> > However sometimes no UTS namespace is available (specifically during system
> > shutdown as the last NFS mount in a container is unmounted) so fall
> > back to the initial UTS namespace.
>
> So what happens if we take this patch and do nothing else?
>
> The only potential problem situation will be rpc requests
> made on behalf of a container in which the last task has
> exited, right? So let's say a container did an nfs mount
> and then exits, causing an nfs umount request.
>
> That umount request will now be sent with the wrong nodename.
> Does that actually cause problems, will the server use the
> nodename to try and determine the client sending the request?
This is just the machine name in the auth_unix credential? The linux
server ignores that completely (for the purpose of auth_unix
authenication, it identifies clients only by source ip address). I
suspect other servers also ignore it, but I don't know.
--b.
>
> thanks,
> -serge
>
> > Signed-off-by: Matt Helsley <[email protected]>
> > Cc: Cedric Le Goater <clg-NmTC/[email protected]>
> > Cc: Linux Kernel Mailing List <[email protected]>
> > Cc: [email protected]
> > Cc: Trond Myklebust <[email protected]>
> > Cc: Chuck Lever <[email protected]>
> > Cc: Eric W. Biederman <[email protected]>
> > Cc: Linux Containers <[email protected]>
> >
> > ---
> > net/sunrpc/clnt.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > Index: linux-2.6.28/net/sunrpc/clnt.c
> > ===================================================================
> > --- linux-2.6.28.orig/net/sunrpc/clnt.c
> > +++ linux-2.6.28/net/sunrpc/clnt.c
> > @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
> > struct rpc_version *version;
> > struct rpc_clnt *clnt = NULL;
> > struct rpc_auth *auth;
> > + struct new_utsname *uts_ns = init_utsname();
> > int err;
> > size_t len;
> >
> > @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
> > }
> >
> > /* save the nodename */
> > - clnt->cl_nodelen = strlen(init_utsname()->nodename);
> > + if (current->nsproxy != NULL)
> > + uts_ns = utsname();
> > + clnt->cl_nodelen = strlen(uts_ns->nodename);
> > if (clnt->cl_nodelen > UNX_MAXNODENAME)
> > clnt->cl_nodelen = UNX_MAXNODENAME;
> > - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
> > + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
> > rpc_register_client(clnt);
> > return clnt;
> >
> >
> > --
> > _______________________________________________
> > Containers mailing list
> > [email protected]
> > https://lists.linux-foundation.org/mailman/listinfo/containers
On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> Quoting Matt Helsley ([email protected]):
> > We can often specify the UTS namespace to use when starting an RPC client.
> > However sometimes no UTS namespace is available (specifically during system
> > shutdown as the last NFS mount in a container is unmounted) so fall
> > back to the initial UTS namespace.
>
> So what happens if we take this patch and do nothing else?
>
> The only potential problem situation will be rpc requests
> made on behalf of a container in which the last task has
> exited, right? So let's say a container did an nfs mount
> and then exits, causing an nfs umount request.
>
> That umount request will now be sent with the wrong nodename.
> Does that actually cause problems, will the server use the
> nodename to try and determine the client sending the request?
The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
userspace, not the kernel. The problem here is that because lazy mounts
exist, the lifetime of the RPC client may be longer than that of the
container. In addition, it may be shared among more than 1 container,
because superblocks can be shared.
One thing you need to be aware of here is that inode dirty data
writebacks may be initiated by completely different processes than the
one that dirtied the inode.
IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
rely on being able to determine the container-specific node name at RPC
generation time are therefore going to return incorrect values.
Trond
Quoting J. Bruce Fields ([email protected]):
> On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley ([email protected]):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> >
> > So what happens if we take this patch and do nothing else?
> >
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right? So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> >
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
>
> This is just the machine name in the auth_unix credential? The linux
> server ignores that completely (for the purpose of auth_unix
> authenication, it identifies clients only by source ip address). I
> suspect other servers also ignore it, but I don't know.
Thanks, that's what i was hoping...
Matt, have you audited the other rpc-based services? Do any
of them care?
-serge
Quoting Trond Myklebust ([email protected]):
> On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley ([email protected]):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> >
> > So what happens if we take this patch and do nothing else?
> >
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right? So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> >
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
>
> The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
> userspace, not the kernel. The problem here is that because lazy mounts
> exist, the lifetime of the RPC client may be longer than that of the
Right that was what i was referring to.
> container. In addition, it may be shared among more than 1 container,
> because superblocks can be shared.
Good point. And in that case what do we care about (even though
apparently we just might not care at all :) - who did the mount,
or who is using it?
In fact one thing I noticed in Matt's patch 3 was that he copied
in the nodename verbatim, so a future hostname() by the container
wouldn't be reflected, again not sure if that would matter.
> One thing you need to be aware of here is that inode dirty data
> writebacks may be initiated by completely different processes than the
> one that dirtied the inode.
Right, but I *was* thinking that we wanted to associate the nodename
on the rpc calls with the hostname of the mounter, not the actor. Maybe
you'll tell me above that that is bogus.
> IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
> rely on being able to determine the container-specific node name at RPC
> generation time are therefore going to return incorrect values.
So should we use patch 2/4, plus (as someone - was it you? - suggested)
using a DEFAULT instead of init_utsname()->nodename when
current->utsname() == NULL?
-serge
On Jan 6, 2009, at Jan 6, 2009, 3:02 PM, Serge E. Hallyn wrote:
> Quoting Matt Helsley ([email protected]):
>> We can often specify the UTS namespace to use when starting an RPC
>> client.
>> However sometimes no UTS namespace is available (specifically
>> during system
>> shutdown as the last NFS mount in a container is unmounted) so fall
>> back to the initial UTS namespace.
>
> So what happens if we take this patch and do nothing else?
I thought the point of this was to prevent incorrect container
nodenames from leaking onto the network.
> The only potential problem situation will be rpc requests
> made on behalf of a container in which the last task has
> exited, right? So let's say a container did an nfs mount
> and then exits, causing an nfs umount request.
>
> That umount request will now be sent with the wrong nodename.
> Does that actually cause problems, will the server use the
> nodename to try and determine the client sending the request?
>
> thanks,
> -serge
>
>> Signed-off-by: Matt Helsley <[email protected]>
>> Cc: Cedric Le Goater <clg-NmTC/[email protected]>
>> Cc: Linux Kernel Mailing List <[email protected]>
>> Cc: [email protected]
>> Cc: Trond Myklebust <[email protected]>
>> Cc: Chuck Lever <[email protected]>
>> Cc: Eric W. Biederman <[email protected]>
>> Cc: Linux Containers <[email protected]>
>>
>> ---
>> net/sunrpc/clnt.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6.28/net/sunrpc/clnt.c
>> ===================================================================
>> --- linux-2.6.28.orig/net/sunrpc/clnt.c
>> +++ linux-2.6.28/net/sunrpc/clnt.c
>> @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
>> struct rpc_version *version;
>> struct rpc_clnt *clnt = NULL;
>> struct rpc_auth *auth;
>> + struct new_utsname *uts_ns = init_utsname();
>> int err;
>> size_t len;
>>
>> @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
>> }
>>
>> /* save the nodename */
>> - clnt->cl_nodelen = strlen(init_utsname()->nodename);
>> + if (current->nsproxy != NULL)
>> + uts_ns = utsname();
>> + clnt->cl_nodelen = strlen(uts_ns->nodename);
>> if (clnt->cl_nodelen > UNX_MAXNODENAME)
>> clnt->cl_nodelen = UNX_MAXNODENAME;
>> - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt-
>> >cl_nodelen);
>> + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
>> rpc_register_client(clnt);
>> return clnt;
>>
>>
>> --
>> _______________________________________________
>> Containers mailing list
>> [email protected]
>> https://lists.linux-foundation.org/mailman/listinfo/containers
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote:
> So should we use patch 2/4, plus (as someone - was it you? - suggested)
> using a DEFAULT instead of init_utsname()->nodename when
> current->utsname() == NULL?
No. I'm don't think that 2/4 is correct either. Basically, 2/4 is saying
that the container that first mounts the filesystem 'owns' it. However
at the same time we know that the lifetime of the filesystem is in no
way bounded by the lifetime of the container, and that's what gets you
into trouble with 'umount' in the first place.
IMO, the current code is the most correct approach, in that it assumes
that the filesystems are owned by the 'init' namespace.
Trond
"Serge E. Hallyn" <[email protected]> writes:
> Quoting Trond Myklebust ([email protected]):
>> On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
>> > Quoting Matt Helsley ([email protected]):
>> > > We can often specify the UTS namespace to use when starting an RPC client.
>> > > However sometimes no UTS namespace is available (specifically during
> system
>> > > shutdown as the last NFS mount in a container is unmounted) so fall
>> > > back to the initial UTS namespace.
>> >
>> > So what happens if we take this patch and do nothing else?
>> >
>> > The only potential problem situation will be rpc requests
>> > made on behalf of a container in which the last task has
>> > exited, right? So let's say a container did an nfs mount
>> > and then exits, causing an nfs umount request.
>> >
>> > That umount request will now be sent with the wrong nodename.
>> > Does that actually cause problems, will the server use the
>> > nodename to try and determine the client sending the request?
>>
>> The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
>> userspace, not the kernel. The problem here is that because lazy mounts
>> exist, the lifetime of the RPC client may be longer than that of the
>
> Right that was what i was referring to.
>
>> container. In addition, it may be shared among more than 1 container,
>> because superblocks can be shared.
>
> Good point. And in that case what do we care about (even though
> apparently we just might not care at all :) - who did the mount,
> or who is using it?
>
> In fact one thing I noticed in Matt's patch 3 was that he copied
> in the nodename verbatim, so a future hostname() by the container
> wouldn't be reflected, again not sure if that would matter.
>
>> One thing you need to be aware of here is that inode dirty data
>> writebacks may be initiated by completely different processes than the
>> one that dirtied the inode.
>
> Right, but I *was* thinking that we wanted to associate the nodename
> on the rpc calls with the hostname of the mounter, not the actor. Maybe
> you'll tell me above that that is bogus.
>
>> IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
>> rely on being able to determine the container-specific node name at RPC
>> generation time are therefore going to return incorrect values.
>
> So should we use patch 2/4, plus (as someone - was it you? - suggested)
> using a DEFAULT instead of init_utsname()->nodename when
> current->utsname() == NULL?
Is there any reason to believe that the kernel helper threads will ever
have a useful namespace value? I don't think so.
That implies to me you want to capture the value at mount time, and to
pass it in to the rpc_call creation, and only at very specific well
defined points where we interact with user space should we examine
current->utsname(). At which point there should be no question
of current->utsname() is valid as the user space process is alive.
Eric
On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> That implies to me you want to capture the value at mount time, and to
> pass it in to the rpc_call creation, and only at very specific well
> defined points where we interact with user space should we examine
> current->utsname(). At which point there should be no question
> of current->utsname() is valid as the user space process is alive.
Why pretend that the filesystem is owned by a particular namespace? It
can, and will be shared among many containers...
Trond
On Tue, 2009-01-06 at 15:20 -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley ([email protected]):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> >
> > So what happens if we take this patch and do nothing else?
> >
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right? So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> >
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
>
> This is just the machine name in the auth_unix credential? The linux
> server ignores that completely (for the purpose of auth_unix
> authenication, it identifies clients only by source ip address). I
> suspect other servers also ignore it, but I don't know.
>
> --b.
I was wondering about this because I kept coming back to the question of
how the server could trust the name the client supplies (seems it
can't). This is very useful information -- it certainly suggests that
this patch would be sufficient.
Thanks!
Cheers,
-Matt Helsley
> >
> > thanks,
> > -serge
> >
> > > Signed-off-by: Matt Helsley <[email protected]>
> > > Cc: Cedric Le Goater <clg-NmTC/[email protected]>
> > > Cc: Linux Kernel Mailing List <[email protected]>
> > > Cc: [email protected]
> > > Cc: Trond Myklebust <[email protected]>
> > > Cc: Chuck Lever <[email protected]>
> > > Cc: Eric W. Biederman <[email protected]>
> > > Cc: Linux Containers <[email protected]>
> > >
> > > ---
> > > net/sunrpc/clnt.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > Index: linux-2.6.28/net/sunrpc/clnt.c
> > > ===================================================================
> > > --- linux-2.6.28.orig/net/sunrpc/clnt.c
> > > +++ linux-2.6.28/net/sunrpc/clnt.c
> > > @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
> > > struct rpc_version *version;
> > > struct rpc_clnt *clnt = NULL;
> > > struct rpc_auth *auth;
> > > + struct new_utsname *uts_ns = init_utsname();
> > > int err;
> > > size_t len;
> > >
> > > @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
> > > }
> > >
> > > /* save the nodename */
> > > - clnt->cl_nodelen = strlen(init_utsname()->nodename);
> > > + if (current->nsproxy != NULL)
> > > + uts_ns = utsname();
> > > + clnt->cl_nodelen = strlen(uts_ns->nodename);
> > > if (clnt->cl_nodelen > UNX_MAXNODENAME)
> > > clnt->cl_nodelen = UNX_MAXNODENAME;
> > > - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
> > > + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
> > > rpc_register_client(clnt);
> > > return clnt;
> > >
> > >
> > > --
> > > _______________________________________________
> > > Containers mailing list
> > > [email protected]
> > > https://lists.linux-foundation.org/mailman/listinfo/containers
On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote:
> Quoting Trond Myklebust ([email protected]):
> > On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> > > Quoting Matt Helsley ([email protected]):
> > > > We can often specify the UTS namespace to use when starting an RPC client.
> > > > However sometimes no UTS namespace is available (specifically during system
> > > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > > back to the initial UTS namespace.
> > >
> > > So what happens if we take this patch and do nothing else?
> > >
> > > The only potential problem situation will be rpc requests
> > > made on behalf of a container in which the last task has
> > > exited, right? So let's say a container did an nfs mount
> > > and then exits, causing an nfs umount request.
> > >
> > > That umount request will now be sent with the wrong nodename.
> > > Does that actually cause problems, will the server use the
> > > nodename to try and determine the client sending the request?
> >
> > The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
> > userspace, not the kernel. The problem here is that because lazy mounts
> > exist, the lifetime of the RPC client may be longer than that of the
>
> Right that was what i was referring to.
>
> > container. In addition, it may be shared among more than 1 container,
> > because superblocks can be shared.
>
> Good point. And in that case what do we care about (even though
> apparently we just might not care at all :) - who did the mount,
> or who is using it?
>
> In fact one thing I noticed in Matt's patch 3 was that he copied
> in the nodename verbatim, so a future hostname() by the container
> wouldn't be reflected, again not sure if that would matter.
I thought of this. I found the patches that added the nodename in the
RPC client struct and the stale nodename was intentional. That's why I
preserved the copy rather than called utsname() each time.
Cheers,
-Matt
On Tue, 2009-01-06 at 15:44 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley ([email protected]):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> >
> > So what happens if we take this patch and do nothing else?
> >
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right? So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> >
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
>
> The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
> userspace, not the kernel. The problem here is that because lazy mounts
Ahh, that's news to me. I thought userspace originated the umount but
the kernel constructed and sent the corresponding RPC call.
> exist, the lifetime of the RPC client may be longer than that of the
> container. In addition, it may be shared among more than 1 container,
> because superblocks can be shared.
Right.
> One thing you need to be aware of here is that inode dirty data
> writebacks may be initiated by completely different processes than the
> one that dirtied the inode.
> IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
> rely on being able to determine the container-specific node name at RPC
> generation time are therefore going to return incorrect values.
Yes, I was aware that the inode might be dirtied by another container. I
was thinking that, at least in the case of NFS, it makes sense to report
the node name of the container that did the original mount. Of course
this doesn't address the general RPC client case and, like patch 3, it
makes the superblock solution rather NFS-specific. That brings me to a
basic question: Are there any RPC clients in the kernel that do not
operate on behalf of NFS?
Thanks!
Cheers,
-Matt Helsley
On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > That implies to me you want to capture the value at mount time, and to
> > pass it in to the rpc_call creation, and only at very specific well
> > defined points where we interact with user space should we examine
> > current->utsname(). At which point there should be no question
> > of current->utsname() is valid as the user space process is alive.
>
> Why pretend that the filesystem is owned by a particular namespace? It
> can, and will be shared among many containers...
If the only purpose of this is to fill in the auth_unix cred then
shouldn't it be part of whatever cred structures are passed around?
But I doubt it's that important to get it right.
--b.
On Tue, 2009-01-06 at 15:53 -0600, Serge E. Hallyn wrote:
> Quoting J. Bruce Fields ([email protected]):
> > On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> > > Quoting Matt Helsley ([email protected]):
> > > > We can often specify the UTS namespace to use when starting an RPC client.
> > > > However sometimes no UTS namespace is available (specifically during system
> > > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > > back to the initial UTS namespace.
> > >
> > > So what happens if we take this patch and do nothing else?
> > >
> > > The only potential problem situation will be rpc requests
> > > made on behalf of a container in which the last task has
> > > exited, right? So let's say a container did an nfs mount
> > > and then exits, causing an nfs umount request.
> > >
> > > That umount request will now be sent with the wrong nodename.
> > > Does that actually cause problems, will the server use the
> > > nodename to try and determine the client sending the request?
> >
> > This is just the machine name in the auth_unix credential? The linux
> > server ignores that completely (for the purpose of auth_unix
> > authenication, it identifies clients only by source ip address). I
> > suspect other servers also ignore it, but I don't know.
>
> Thanks, that's what i was hoping...
>
> Matt, have you audited the other rpc-based services? Do any
> of them care?
Frankly, I did not audit any of the RPC-based services to see if any
truly cared about the node name. That's actually a rather large scope
when you consider it -- I'd have to look at much more than just "Linux"
RPC clients. It seemed unsafe to assume _nobody_ would care after they
bothered to put it into the spec.
Hopefully I'm wrong though :).
Cheers,
-Matt
On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > That implies to me you want to capture the value at mount time, and to
> > > pass it in to the rpc_call creation, and only at very specific well
> > > defined points where we interact with user space should we examine
> > > current->utsname(). At which point there should be no question
> > > of current->utsname() is valid as the user space process is alive.
> >
> > Why pretend that the filesystem is owned by a particular namespace? It
> > can, and will be shared among many containers...
>
> If the only purpose of this is to fill in the auth_unix cred then
> shouldn't it be part of whatever cred structures are passed around?
So how does tracking it in a shared structure like the rpc_client help?
If you consider it to be part of the cred, then it needs to be tracked
in the cred...
Trond
On Tue, 2009-01-06 at 15:18 -0800, Matt Helsley wrote:
> Yes, I was aware that the inode might be dirtied by another container. I
> was thinking that, at least in the case of NFS, it makes sense to report
> the node name of the container that did the original mount. Of course
> this doesn't address the general RPC client case and, like patch 3, it
> makes the superblock solution rather NFS-specific. That brings me to a
> basic question: Are there any RPC clients in the kernel that do not
> operate on behalf of NFS?
Yes. There is, for instance, the portmapper/rpcbind client. The NFS and
lockd servers also sometime needs to send RPC callbacks. Assuming that
you can convert the RPC client into something NFS-specific is therefore
not an option.
Trond
On Tue, 2009-01-06 at 18:35 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > > That implies to me you want to capture the value at mount time, and to
> > > > pass it in to the rpc_call creation, and only at very specific well
> > > > defined points where we interact with user space should we examine
> > > > current->utsname(). At which point there should be no question
> > > > of current->utsname() is valid as the user space process is alive.
> > >
> > > Why pretend that the filesystem is owned by a particular namespace? It
> > > can, and will be shared among many containers...
> >
> > If the only purpose of this is to fill in the auth_unix cred then
> > shouldn't it be part of whatever cred structures are passed around?
>
> So how does tracking it in a shared structure like the rpc_client help?
> If you consider it to be part of the cred, then it needs to be tracked
> in the cred...
>
> Trond
I originally discarded the idea of relating it to the credentials
because it looked like the credential cache could be flushed at any time
(according to the small portion of the RFC I read) so I was worried we'd
have even worse lifetime issues to deal with.
Cheers,
-Matt
On Jan 6, 2009, at Jan 6, 2009, 6:35 PM, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
>> On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
>>> On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
>>>> That implies to me you want to capture the value at mount time,
>>>> and to
>>>> pass it in to the rpc_call creation, and only at very specific well
>>>> defined points where we interact with user space should we examine
>>>> current->utsname(). At which point there should be no question
>>>> of current->utsname() is valid as the user space process is alive.
>>>
>>> Why pretend that the filesystem is owned by a particular
>>> namespace? It
>>> can, and will be shared among many containers...
>>
>> If the only purpose of this is to fill in the auth_unix cred then
>> shouldn't it be part of whatever cred structures are passed around?
>
> So how does tracking it in a shared structure like the rpc_client
> help?
> If you consider it to be part of the cred, then it needs to be tracked
> in the cred...
I think generating a proper AUTH_SYS cred, given the existence of
containers, is the essential question here.
However, we use nodename for lock owners too... perhaps that deserves
a separate solution.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > > That implies to me you want to capture the value at mount time, and to
> > > > pass it in to the rpc_call creation, and only at very specific well
> > > > defined points where we interact with user space should we examine
> > > > current->utsname(). At which point there should be no question
> > > > of current->utsname() is valid as the user space process is alive.
> > >
> > > Why pretend that the filesystem is owned by a particular namespace? It
> > > can, and will be shared among many containers...
> >
> > If the only purpose of this is to fill in the auth_unix cred then
> > shouldn't it be part of whatever cred structures are passed around?
>
> So how does tracking it in a shared structure like the rpc_client help?
> If you consider it to be part of the cred, then it needs to be tracked
> in the cred...
Right, that's what I meant.
It seems like overkill, though. Does anyone actually care whether these
names are right?
--b.
On Tue, 2009-01-06 at 18:43 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 15:18 -0800, Matt Helsley wrote:
> > Yes, I was aware that the inode might be dirtied by another container. I
> > was thinking that, at least in the case of NFS, it makes sense to report
> > the node name of the container that did the original mount. Of course
> > this doesn't address the general RPC client case and, like patch 3, it
> > makes the superblock solution rather NFS-specific. That brings me to a
> > basic question: Are there any RPC clients in the kernel that do not
> > operate on behalf of NFS?
>
> Yes. There is, for instance, the portmapper/rpcbind client.
<snip>
OK that's a good example. Point taken.
> The NFS and
> lockd servers also sometime needs to send RPC callbacks.
Yes, these won't have an NFS superblock handy and hence they use the
node name cached in the RPC client struct. That means that these patches
have only addressed NFS client-side namespace issues. Have you seen J.
Bruce Fields post on the nfsd side of things?
Cheers,
-Matt Helsley