2009-01-06 01:13:47

by Matt Helsley

[permalink] [raw]
Subject: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

We can often specify the UTS namespace to use when starting an RPC client.
However sometimes no UTS namespace is available (specifically during system
shutdown as the last NFS mount in a container is unmounted) so fall
back to the initial UTS namespace.

Signed-off-by: Matt Helsley <[email protected]>
Cc: Cedric Le Goater <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: [email protected]
Cc: Trond Myklebust <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Linux Containers <[email protected]>

---
net/sunrpc/clnt.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6.28/net/sunrpc/clnt.c
===================================================================
--- linux-2.6.28.orig/net/sunrpc/clnt.c
+++ linux-2.6.28/net/sunrpc/clnt.c
@@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
struct rpc_version *version;
struct rpc_clnt *clnt = NULL;
struct rpc_auth *auth;
+ struct new_utsname *uts_ns = init_utsname();
int err;
size_t len;

@@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
}

/* save the nodename */
- clnt->cl_nodelen = strlen(init_utsname()->nodename);
+ if (current->nsproxy != NULL)
+ uts_ns = utsname();
+ clnt->cl_nodelen = strlen(uts_ns->nodename);
if (clnt->cl_nodelen > UNX_MAXNODENAME)
clnt->cl_nodelen = UNX_MAXNODENAME;
- memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
+ memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
rpc_register_client(clnt);
return clnt;


--


2009-01-06 20:03:25

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

Quoting Matt Helsley ([email protected]):
> We can often specify the UTS namespace to use when starting an RPC client.
> However sometimes no UTS namespace is available (specifically during system
> shutdown as the last NFS mount in a container is unmounted) so fall
> back to the initial UTS namespace.

So what happens if we take this patch and do nothing else?

The only potential problem situation will be rpc requests
made on behalf of a container in which the last task has
exited, right? So let's say a container did an nfs mount
and then exits, causing an nfs umount request.

That umount request will now be sent with the wrong nodename.
Does that actually cause problems, will the server use the
nodename to try and determine the client sending the request?

thanks,
-serge

> Signed-off-by: Matt Helsley <[email protected]>
> Cc: Cedric Le Goater <[email protected]>
> Cc: Linux Kernel Mailing List <[email protected]>
> Cc: [email protected]
> Cc: Trond Myklebust <[email protected]>
> Cc: Chuck Lever <[email protected]>
> Cc: Eric W. Biederman <[email protected]>
> Cc: Linux Containers <[email protected]>
>
> ---
> net/sunrpc/clnt.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> Index: linux-2.6.28/net/sunrpc/clnt.c
> ===================================================================
> --- linux-2.6.28.orig/net/sunrpc/clnt.c
> +++ linux-2.6.28/net/sunrpc/clnt.c
> @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
> struct rpc_version *version;
> struct rpc_clnt *clnt = NULL;
> struct rpc_auth *auth;
> + struct new_utsname *uts_ns = init_utsname();
> int err;
> size_t len;
>
> @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
> }
>
> /* save the nodename */
> - clnt->cl_nodelen = strlen(init_utsname()->nodename);
> + if (current->nsproxy != NULL)
> + uts_ns = utsname();
> + clnt->cl_nodelen = strlen(uts_ns->nodename);
> if (clnt->cl_nodelen > UNX_MAXNODENAME)
> clnt->cl_nodelen = UNX_MAXNODENAME;
> - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
> + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
> rpc_register_client(clnt);
> return clnt;
>
>
> --
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/containers

2009-01-06 20:21:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> Quoting Matt Helsley ([email protected]):
> > We can often specify the UTS namespace to use when starting an RPC client.
> > However sometimes no UTS namespace is available (specifically during system
> > shutdown as the last NFS mount in a container is unmounted) so fall
> > back to the initial UTS namespace.
>
> So what happens if we take this patch and do nothing else?
>
> The only potential problem situation will be rpc requests
> made on behalf of a container in which the last task has
> exited, right? So let's say a container did an nfs mount
> and then exits, causing an nfs umount request.
>
> That umount request will now be sent with the wrong nodename.
> Does that actually cause problems, will the server use the
> nodename to try and determine the client sending the request?

This is just the machine name in the auth_unix credential? The linux
server ignores that completely (for the purpose of auth_unix
authenication, it identifies clients only by source ip address). I
suspect other servers also ignore it, but I don't know.

--b.

>
> thanks,
> -serge
>
> > Signed-off-by: Matt Helsley <[email protected]>
> > Cc: Cedric Le Goater <[email protected]>
> > Cc: Linux Kernel Mailing List <[email protected]>
> > Cc: [email protected]
> > Cc: Trond Myklebust <[email protected]>
> > Cc: Chuck Lever <[email protected]>
> > Cc: Eric W. Biederman <[email protected]>
> > Cc: Linux Containers <[email protected]>
> >
> > ---
> > net/sunrpc/clnt.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > Index: linux-2.6.28/net/sunrpc/clnt.c
> > ===================================================================
> > --- linux-2.6.28.orig/net/sunrpc/clnt.c
> > +++ linux-2.6.28/net/sunrpc/clnt.c
> > @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
> > struct rpc_version *version;
> > struct rpc_clnt *clnt = NULL;
> > struct rpc_auth *auth;
> > + struct new_utsname *uts_ns = init_utsname();
> > int err;
> > size_t len;
> >
> > @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
> > }
> >
> > /* save the nodename */
> > - clnt->cl_nodelen = strlen(init_utsname()->nodename);
> > + if (current->nsproxy != NULL)
> > + uts_ns = utsname();
> > + clnt->cl_nodelen = strlen(uts_ns->nodename);
> > if (clnt->cl_nodelen > UNX_MAXNODENAME)
> > clnt->cl_nodelen = UNX_MAXNODENAME;
> > - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
> > + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
> > rpc_register_client(clnt);
> > return clnt;
> >
> >
> > --
> > _______________________________________________
> > Containers mailing list
> > [email protected]
> > https://lists.linux-foundation.org/mailman/listinfo/containers

2009-01-06 20:45:04

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> Quoting Matt Helsley ([email protected]):
> > We can often specify the UTS namespace to use when starting an RPC client.
> > However sometimes no UTS namespace is available (specifically during system
> > shutdown as the last NFS mount in a container is unmounted) so fall
> > back to the initial UTS namespace.
>
> So what happens if we take this patch and do nothing else?
>
> The only potential problem situation will be rpc requests
> made on behalf of a container in which the last task has
> exited, right? So let's say a container did an nfs mount
> and then exits, causing an nfs umount request.
>
> That umount request will now be sent with the wrong nodename.
> Does that actually cause problems, will the server use the
> nodename to try and determine the client sending the request?

The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
userspace, not the kernel. The problem here is that because lazy mounts
exist, the lifetime of the RPC client may be longer than that of the
container. In addition, it may be shared among more than 1 container,
because superblocks can be shared.

One thing you need to be aware of here is that inode dirty data
writebacks may be initiated by completely different processes than the
one that dirtied the inode.
IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
rely on being able to determine the container-specific node name at RPC
generation time are therefore going to return incorrect values.

Trond

2009-01-06 21:53:46

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

Quoting J. Bruce Fields ([email protected]):
> On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley ([email protected]):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> >
> > So what happens if we take this patch and do nothing else?
> >
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right? So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> >
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
>
> This is just the machine name in the auth_unix credential? The linux
> server ignores that completely (for the purpose of auth_unix
> authenication, it identifies clients only by source ip address). I
> suspect other servers also ignore it, but I don't know.

Thanks, that's what i was hoping...

Matt, have you audited the other rpc-based services? Do any
of them care?

-serge

2009-01-06 21:58:51

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

Quoting Trond Myklebust ([email protected]):
> On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley ([email protected]):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> >
> > So what happens if we take this patch and do nothing else?
> >
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right? So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> >
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
>
> The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
> userspace, not the kernel. The problem here is that because lazy mounts
> exist, the lifetime of the RPC client may be longer than that of the

Right that was what i was referring to.

> container. In addition, it may be shared among more than 1 container,
> because superblocks can be shared.

Good point. And in that case what do we care about (even though
apparently we just might not care at all :) - who did the mount,
or who is using it?

In fact one thing I noticed in Matt's patch 3 was that he copied
in the nodename verbatim, so a future hostname() by the container
wouldn't be reflected, again not sure if that would matter.

> One thing you need to be aware of here is that inode dirty data
> writebacks may be initiated by completely different processes than the
> one that dirtied the inode.

Right, but I *was* thinking that we wanted to associate the nodename
on the rpc calls with the hostname of the mounter, not the actor. Maybe
you'll tell me above that that is bogus.

> IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
> rely on being able to determine the container-specific node name at RPC
> generation time are therefore going to return incorrect values.

So should we use patch 2/4, plus (as someone - was it you? - suggested)
using a DEFAULT instead of init_utsname()->nodename when
current->utsname() == NULL?

-serge

2009-01-06 22:38:37

by Chuck Lever III

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Jan 6, 2009, at Jan 6, 2009, 3:02 PM, Serge E. Hallyn wrote:
> Quoting Matt Helsley ([email protected]):
>> We can often specify the UTS namespace to use when starting an RPC
>> client.
>> However sometimes no UTS namespace is available (specifically
>> during system
>> shutdown as the last NFS mount in a container is unmounted) so fall
>> back to the initial UTS namespace.
>
> So what happens if we take this patch and do nothing else?

I thought the point of this was to prevent incorrect container
nodenames from leaking onto the network.

> The only potential problem situation will be rpc requests
> made on behalf of a container in which the last task has
> exited, right? So let's say a container did an nfs mount
> and then exits, causing an nfs umount request.
>
> That umount request will now be sent with the wrong nodename.
> Does that actually cause problems, will the server use the
> nodename to try and determine the client sending the request?
>
> thanks,
> -serge
>
>> Signed-off-by: Matt Helsley <[email protected]>
>> Cc: Cedric Le Goater <[email protected]>
>> Cc: Linux Kernel Mailing List <[email protected]>
>> Cc: [email protected]
>> Cc: Trond Myklebust <[email protected]>
>> Cc: Chuck Lever <[email protected]>
>> Cc: Eric W. Biederman <[email protected]>
>> Cc: Linux Containers <[email protected]>
>>
>> ---
>> net/sunrpc/clnt.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6.28/net/sunrpc/clnt.c
>> ===================================================================
>> --- linux-2.6.28.orig/net/sunrpc/clnt.c
>> +++ linux-2.6.28/net/sunrpc/clnt.c
>> @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
>> struct rpc_version *version;
>> struct rpc_clnt *clnt = NULL;
>> struct rpc_auth *auth;
>> + struct new_utsname *uts_ns = init_utsname();
>> int err;
>> size_t len;
>>
>> @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
>> }
>>
>> /* save the nodename */
>> - clnt->cl_nodelen = strlen(init_utsname()->nodename);
>> + if (current->nsproxy != NULL)
>> + uts_ns = utsname();
>> + clnt->cl_nodelen = strlen(uts_ns->nodename);
>> if (clnt->cl_nodelen > UNX_MAXNODENAME)
>> clnt->cl_nodelen = UNX_MAXNODENAME;
>> - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt-
>> >cl_nodelen);
>> + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
>> rpc_register_client(clnt);
>> return clnt;
>>
>>
>> --
>> _______________________________________________
>> Containers mailing list
>> [email protected]
>> https://lists.linux-foundation.org/mailman/listinfo/containers

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



2009-01-06 22:42:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote:

> So should we use patch 2/4, plus (as someone - was it you? - suggested)
> using a DEFAULT instead of init_utsname()->nodename when
> current->utsname() == NULL?

No. I'm don't think that 2/4 is correct either. Basically, 2/4 is saying
that the container that first mounts the filesystem 'owns' it. However
at the same time we know that the lifetime of the filesystem is in no
way bounded by the lifetime of the container, and that's what gets you
into trouble with 'umount' in the first place.

IMO, the current code is the most correct approach, in that it assumes
that the filesystems are owned by the 'init' namespace.

Trond

2009-01-06 23:07:59

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

"Serge E. Hallyn" <[email protected]> writes:

> Quoting Trond Myklebust ([email protected]):
>> On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
>> > Quoting Matt Helsley ([email protected]):
>> > > We can often specify the UTS namespace to use when starting an RPC client.
>> > > However sometimes no UTS namespace is available (specifically during
> system
>> > > shutdown as the last NFS mount in a container is unmounted) so fall
>> > > back to the initial UTS namespace.
>> >
>> > So what happens if we take this patch and do nothing else?
>> >
>> > The only potential problem situation will be rpc requests
>> > made on behalf of a container in which the last task has
>> > exited, right? So let's say a container did an nfs mount
>> > and then exits, causing an nfs umount request.
>> >
>> > That umount request will now be sent with the wrong nodename.
>> > Does that actually cause problems, will the server use the
>> > nodename to try and determine the client sending the request?
>>
>> The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
>> userspace, not the kernel. The problem here is that because lazy mounts
>> exist, the lifetime of the RPC client may be longer than that of the
>
> Right that was what i was referring to.
>
>> container. In addition, it may be shared among more than 1 container,
>> because superblocks can be shared.
>
> Good point. And in that case what do we care about (even though
> apparently we just might not care at all :) - who did the mount,
> or who is using it?
>
> In fact one thing I noticed in Matt's patch 3 was that he copied
> in the nodename verbatim, so a future hostname() by the container
> wouldn't be reflected, again not sure if that would matter.
>
>> One thing you need to be aware of here is that inode dirty data
>> writebacks may be initiated by completely different processes than the
>> one that dirtied the inode.
>
> Right, but I *was* thinking that we wanted to associate the nodename
> on the rpc calls with the hostname of the mounter, not the actor. Maybe
> you'll tell me above that that is bogus.
>
>> IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
>> rely on being able to determine the container-specific node name at RPC
>> generation time are therefore going to return incorrect values.
>
> So should we use patch 2/4, plus (as someone - was it you? - suggested)
> using a DEFAULT instead of init_utsname()->nodename when
> current->utsname() == NULL?

Is there any reason to believe that the kernel helper threads will ever
have a useful namespace value? I don't think so.

That implies to me you want to capture the value at mount time, and to
pass it in to the rpc_call creation, and only at very specific well
defined points where we interact with user space should we examine
current->utsname(). At which point there should be no question
of current->utsname() is valid as the user space process is alive.

Eric

2009-01-06 23:15:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> That implies to me you want to capture the value at mount time, and to
> pass it in to the rpc_call creation, and only at very specific well
> defined points where we interact with user space should we examine
> current->utsname(). At which point there should be no question
> of current->utsname() is valid as the user space process is alive.

Why pretend that the filesystem is owned by a particular namespace? It
can, and will be shared among many containers...

Trond

2009-01-06 23:30:53

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 15:20 -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley ([email protected]):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> >
> > So what happens if we take this patch and do nothing else?
> >
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right? So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> >
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
>
> This is just the machine name in the auth_unix credential? The linux
> server ignores that completely (for the purpose of auth_unix
> authenication, it identifies clients only by source ip address). I
> suspect other servers also ignore it, but I don't know.
>
> --b.

I was wondering about this because I kept coming back to the question of
how the server could trust the name the client supplies (seems it
can't). This is very useful information -- it certainly suggests that
this patch would be sufficient.

Thanks!

Cheers,
-Matt Helsley

> >
> > thanks,
> > -serge
> >
> > > Signed-off-by: Matt Helsley <[email protected]>
> > > Cc: Cedric Le Goater <[email protected]>
> > > Cc: Linux Kernel Mailing List <[email protected]>
> > > Cc: [email protected]
> > > Cc: Trond Myklebust <[email protected]>
> > > Cc: Chuck Lever <[email protected]>
> > > Cc: Eric W. Biederman <[email protected]>
> > > Cc: Linux Containers <[email protected]>
> > >
> > > ---
> > > net/sunrpc/clnt.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > Index: linux-2.6.28/net/sunrpc/clnt.c
> > > ===================================================================
> > > --- linux-2.6.28.orig/net/sunrpc/clnt.c
> > > +++ linux-2.6.28/net/sunrpc/clnt.c
> > > @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
> > > struct rpc_version *version;
> > > struct rpc_clnt *clnt = NULL;
> > > struct rpc_auth *auth;
> > > + struct new_utsname *uts_ns = init_utsname();
> > > int err;
> > > size_t len;
> > >
> > > @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
> > > }
> > >
> > > /* save the nodename */
> > > - clnt->cl_nodelen = strlen(init_utsname()->nodename);
> > > + if (current->nsproxy != NULL)
> > > + uts_ns = utsname();
> > > + clnt->cl_nodelen = strlen(uts_ns->nodename);
> > > if (clnt->cl_nodelen > UNX_MAXNODENAME)
> > > clnt->cl_nodelen = UNX_MAXNODENAME;
> > > - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
> > > + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
> > > rpc_register_client(clnt);
> > > return clnt;
> > >
> > >
> > > --
> > > _______________________________________________
> > > Containers mailing list
> > > [email protected]
> > > https://lists.linux-foundation.org/mailman/listinfo/containers

2009-01-06 23:31:33

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 15:44 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley ([email protected]):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> >
> > So what happens if we take this patch and do nothing else?
> >
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right? So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> >
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
>
> The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
> userspace, not the kernel. The problem here is that because lazy mounts

Ahh, that's news to me. I thought userspace originated the umount but
the kernel constructed and sent the corresponding RPC call.

> exist, the lifetime of the RPC client may be longer than that of the
> container. In addition, it may be shared among more than 1 container,
> because superblocks can be shared.

Right.

> One thing you need to be aware of here is that inode dirty data
> writebacks may be initiated by completely different processes than the
> one that dirtied the inode.
> IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
> rely on being able to determine the container-specific node name at RPC
> generation time are therefore going to return incorrect values.

Yes, I was aware that the inode might be dirtied by another container. I
was thinking that, at least in the case of NFS, it makes sense to report
the node name of the container that did the original mount. Of course
this doesn't address the general RPC client case and, like patch 3, it
makes the superblock solution rather NFS-specific. That brings me to a
basic question: Are there any RPC clients in the kernel that do not
operate on behalf of NFS?

Thanks!

Cheers,
-Matt Helsley

2009-01-06 23:31:50

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote:
> Quoting Trond Myklebust ([email protected]):
> > On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> > > Quoting Matt Helsley ([email protected]):
> > > > We can often specify the UTS namespace to use when starting an RPC client.
> > > > However sometimes no UTS namespace is available (specifically during system
> > > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > > back to the initial UTS namespace.
> > >
> > > So what happens if we take this patch and do nothing else?
> > >
> > > The only potential problem situation will be rpc requests
> > > made on behalf of a container in which the last task has
> > > exited, right? So let's say a container did an nfs mount
> > > and then exits, causing an nfs umount request.
> > >
> > > That umount request will now be sent with the wrong nodename.
> > > Does that actually cause problems, will the server use the
> > > nodename to try and determine the client sending the request?
> >
> > The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
> > userspace, not the kernel. The problem here is that because lazy mounts
> > exist, the lifetime of the RPC client may be longer than that of the
>
> Right that was what i was referring to.
>
> > container. In addition, it may be shared among more than 1 container,
> > because superblocks can be shared.
>
> Good point. And in that case what do we care about (even though
> apparently we just might not care at all :) - who did the mount,
> or who is using it?
>
> In fact one thing I noticed in Matt's patch 3 was that he copied
> in the nodename verbatim, so a future hostname() by the container
> wouldn't be reflected, again not sure if that would matter.

I thought of this. I found the patches that added the nodename in the
RPC client struct and the stale nodename was intentional. That's why I
preserved the copy rather than called utsname() each time.

Cheers,
-Matt

2009-01-06 23:33:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > That implies to me you want to capture the value at mount time, and to
> > pass it in to the rpc_call creation, and only at very specific well
> > defined points where we interact with user space should we examine
> > current->utsname(). At which point there should be no question
> > of current->utsname() is valid as the user space process is alive.
>
> Why pretend that the filesystem is owned by a particular namespace? It
> can, and will be shared among many containers...

If the only purpose of this is to fill in the auth_unix cred then
shouldn't it be part of whatever cred structures are passed around?

But I doubt it's that important to get it right.

--b.

2009-01-06 23:35:36

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 15:53 -0600, Serge E. Hallyn wrote:
> Quoting J. Bruce Fields ([email protected]):
> > On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> > > Quoting Matt Helsley ([email protected]):
> > > > We can often specify the UTS namespace to use when starting an RPC client.
> > > > However sometimes no UTS namespace is available (specifically during system
> > > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > > back to the initial UTS namespace.
> > >
> > > So what happens if we take this patch and do nothing else?
> > >
> > > The only potential problem situation will be rpc requests
> > > made on behalf of a container in which the last task has
> > > exited, right? So let's say a container did an nfs mount
> > > and then exits, causing an nfs umount request.
> > >
> > > That umount request will now be sent with the wrong nodename.
> > > Does that actually cause problems, will the server use the
> > > nodename to try and determine the client sending the request?
> >
> > This is just the machine name in the auth_unix credential? The linux
> > server ignores that completely (for the purpose of auth_unix
> > authenication, it identifies clients only by source ip address). I
> > suspect other servers also ignore it, but I don't know.
>
> Thanks, that's what i was hoping...
>
> Matt, have you audited the other rpc-based services? Do any
> of them care?

Frankly, I did not audit any of the RPC-based services to see if any
truly cared about the node name. That's actually a rather large scope
when you consider it -- I'd have to look at much more than just "Linux"
RPC clients. It seemed unsafe to assume _nobody_ would care after they
bothered to put it into the spec.

Hopefully I'm wrong though :).

Cheers,
-Matt

2009-01-06 23:36:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > That implies to me you want to capture the value at mount time, and to
> > > pass it in to the rpc_call creation, and only at very specific well
> > > defined points where we interact with user space should we examine
> > > current->utsname(). At which point there should be no question
> > > of current->utsname() is valid as the user space process is alive.
> >
> > Why pretend that the filesystem is owned by a particular namespace? It
> > can, and will be shared among many containers...
>
> If the only purpose of this is to fill in the auth_unix cred then
> shouldn't it be part of whatever cred structures are passed around?

So how does tracking it in a shared structure like the rpc_client help?
If you consider it to be part of the cred, then it needs to be tracked
in the cred...

Trond

2009-01-06 23:44:35

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 15:18 -0800, Matt Helsley wrote:
> Yes, I was aware that the inode might be dirtied by another container. I
> was thinking that, at least in the case of NFS, it makes sense to report
> the node name of the container that did the original mount. Of course
> this doesn't address the general RPC client case and, like patch 3, it
> makes the superblock solution rather NFS-specific. That brings me to a
> basic question: Are there any RPC clients in the kernel that do not
> operate on behalf of NFS?

Yes. There is, for instance, the portmapper/rpcbind client. The NFS and
lockd servers also sometime needs to send RPC callbacks. Assuming that
you can convert the RPC client into something NFS-specific is therefore
not an option.

Trond

2009-01-06 23:49:19

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 18:35 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > > That implies to me you want to capture the value at mount time, and to
> > > > pass it in to the rpc_call creation, and only at very specific well
> > > > defined points where we interact with user space should we examine
> > > > current->utsname(). At which point there should be no question
> > > > of current->utsname() is valid as the user space process is alive.
> > >
> > > Why pretend that the filesystem is owned by a particular namespace? It
> > > can, and will be shared among many containers...
> >
> > If the only purpose of this is to fill in the auth_unix cred then
> > shouldn't it be part of whatever cred structures are passed around?
>
> So how does tracking it in a shared structure like the rpc_client help?
> If you consider it to be part of the cred, then it needs to be tracked
> in the cred...
>
> Trond

I originally discarded the idea of relating it to the credentials
because it looked like the credential cache could be flushed at any time
(according to the small portion of the RFC I read) so I was worried we'd
have even worse lifetime issues to deal with.

Cheers,
-Matt

2009-01-06 23:53:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > > That implies to me you want to capture the value at mount time, and to
> > > > pass it in to the rpc_call creation, and only at very specific well
> > > > defined points where we interact with user space should we examine
> > > > current->utsname(). At which point there should be no question
> > > > of current->utsname() is valid as the user space process is alive.
> > >
> > > Why pretend that the filesystem is owned by a particular namespace? It
> > > can, and will be shared among many containers...
> >
> > If the only purpose of this is to fill in the auth_unix cred then
> > shouldn't it be part of whatever cred structures are passed around?
>
> So how does tracking it in a shared structure like the rpc_client help?
> If you consider it to be part of the cred, then it needs to be tracked
> in the cred...

Right, that's what I meant.

It seems like overkill, though. Does anyone actually care whether these
names are right?

--b.

2009-01-06 23:53:58

by Chuck Lever III

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Jan 6, 2009, at Jan 6, 2009, 6:35 PM, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
>> On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
>>> On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
>>>> That implies to me you want to capture the value at mount time,
>>>> and to
>>>> pass it in to the rpc_call creation, and only at very specific well
>>>> defined points where we interact with user space should we examine
>>>> current->utsname(). At which point there should be no question
>>>> of current->utsname() is valid as the user space process is alive.
>>>
>>> Why pretend that the filesystem is owned by a particular
>>> namespace? It
>>> can, and will be shared among many containers...
>>
>> If the only purpose of this is to fill in the auth_unix cred then
>> shouldn't it be part of whatever cred structures are passed around?
>
> So how does tracking it in a shared structure like the rpc_client
> help?
> If you consider it to be part of the cred, then it needs to be tracked
> in the cred...

I think generating a proper AUTH_SYS cred, given the existence of
containers, is the essential question here.

However, we use nodename for lock owners too... perhaps that deserves
a separate solution.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2009-01-06 23:58:49

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 18:43 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 15:18 -0800, Matt Helsley wrote:
> > Yes, I was aware that the inode might be dirtied by another container. I
> > was thinking that, at least in the case of NFS, it makes sense to report
> > the node name of the container that did the original mount. Of course
> > this doesn't address the general RPC client case and, like patch 3, it
> > makes the superblock solution rather NFS-specific. That brings me to a
> > basic question: Are there any RPC clients in the kernel that do not
> > operate on behalf of NFS?
>
> Yes. There is, for instance, the portmapper/rpcbind client.

<snip>

OK that's a good example. Point taken.

> The NFS and
> lockd servers also sometime needs to send RPC callbacks.

Yes, these won't have an NFS superblock handy and hence they use the
node name cached in the RPC client struct. That means that these patches
have only addressed NFS client-side namespace issues. Have you seen J.
Bruce Fields post on the nfsd side of things?

Cheers,
-Matt Helsley

2009-01-07 00:01:48

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

Quoting Chuck Lever ([email protected]):
> On Jan 6, 2009, at Jan 6, 2009, 3:02 PM, Serge E. Hallyn wrote:
>> Quoting Matt Helsley ([email protected]):
>>> We can often specify the UTS namespace to use when starting an RPC
>>> client.
>>> However sometimes no UTS namespace is available (specifically during
>>> system
>>> shutdown as the last NFS mount in a container is unmounted) so fall
>>> back to the initial UTS namespace.
>>
>> So what happens if we take this patch and do nothing else?
>
> I thought the point of this was to prevent incorrect container nodenames
> from leaking onto the network.

But define incorrect. If container B does an nfs mount, container c
is launched with a tree in that mount, container B dies, and container C
umounts it. Should the umount belong to container B (for having
mounted it), container C (for having umount it), or the init_utsname
(for being the physical host/kernel)?

I get the feeling that consensus on this thread is that init_utsname
is actually the best choice, but OTOH if I have 3 containers on my
host, for apache, mysql, and postfix servers, and each is doing
nfs mounts from a physically remote machine, maybe I care about
having them report separate nodenames?

(that's a question, I really don't know...)

-serge

2009-01-07 00:08:19

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote:
> > On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> > > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > > > That implies to me you want to capture the value at mount time, and to
> > > > > pass it in to the rpc_call creation, and only at very specific well
> > > > > defined points where we interact with user space should we examine
> > > > > current->utsname(). At which point there should be no question
> > > > > of current->utsname() is valid as the user space process is alive.
> > > >
> > > > Why pretend that the filesystem is owned by a particular namespace? It
> > > > can, and will be shared among many containers...
> > >
> > > If the only purpose of this is to fill in the auth_unix cred then
> > > shouldn't it be part of whatever cred structures are passed around?
> >
> > So how does tracking it in a shared structure like the rpc_client help?
> > If you consider it to be part of the cred, then it needs to be tracked
> > in the cred...
>
> Right, that's what I meant.
>
> It seems like overkill, though. Does anyone actually care whether these
> names are right?

That's certainly a tempting angle. However we may not "control" the
server code -- couldn't there be some oddball (maybe even proprietary)
NFS servers out there that users do care about interacting with?

Cheers,
-Matt Helsley

2009-01-07 00:09:25

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 17:42 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote:
>
> > So should we use patch 2/4, plus (as someone - was it you? - suggested)
> > using a DEFAULT instead of init_utsname()->nodename when
> > current->utsname() == NULL?
>
> No. I'm don't think that 2/4 is correct either. Basically, 2/4 is saying
> that the container that first mounts the filesystem 'owns' it. However
> at the same time we know that the lifetime of the filesystem is in no
> way bounded by the lifetime of the container, and that's what gets you
> into trouble with 'umount' in the first place.
>
> IMO, the current code is the most correct approach, in that it assumes
> that the filesystems are owned by the 'init' namespace.

IMHO This seems more incorrect than trying to use a more proximal
namespace.

Cheers,
-Matt Helsley

2009-01-07 00:20:41

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote:
> It seems like overkill, though. Does anyone actually care whether these
> names are right?

I very much doubt it. Particularly since the actual sockets used by the
RPC client are always owned by the init namespace (they are established
from a workqueue process), and so the source IP address will always be
that of the init namespace.

Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-01-07 00:20:57

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 18:35 -0500, Trond Myklebust wrote:
> So how does tracking it in a shared structure like the rpc_client help?
> If you consider it to be part of the cred, then it needs to be tracked
> in the cred...

OK. If people really want to track this, then you could add a reference
to the current->nsproxy to the struct rpc_cred and struct auth_cred, and
ensure that the unx_marshal and unx_match routines do the right thing
w.r.t. that reference (if it exists).
However such a patch had better be accompanied with a _really_
convincing argument for why containers need this...

As for the NFSv4 clientid, I can't see how you would ever want to use
anything other than the init->utsname(), since the requirement is only
that the clientid string be unique and preserved across reboots. The
server isn't allowed to interpret the contents of the clientid string.
Ditto for the RPCSEC_GSS machine creds that are used to establish the
clientid.

Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-01-07 00:21:27

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote:
> IMHO This seems more incorrect than trying to use a more proximal
> namespace.

You have yet to explain why.
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-01-07 00:21:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, Jan 06, 2009 at 04:08:50PM -0800, Matt Helsley wrote:
> On Tue, 2009-01-06 at 17:42 -0500, Trond Myklebust wrote:
> > On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote:
> >
> > > So should we use patch 2/4, plus (as someone - was it you? - suggested)
> > > using a DEFAULT instead of init_utsname()->nodename when
> > > current->utsname() == NULL?
> >
> > No. I'm don't think that 2/4 is correct either. Basically, 2/4 is saying
> > that the container that first mounts the filesystem 'owns' it. However
> > at the same time we know that the lifetime of the filesystem is in no
> > way bounded by the lifetime of the container, and that's what gets you
> > into trouble with 'umount' in the first place.
> >
> > IMO, the current code is the most correct approach, in that it assumes
> > that the filesystems are owned by the 'init' namespace.
>
> IMHO This seems more incorrect than trying to use a more proximal
> namespace.

If it would be possible, for example, for the 'init' namespace to have
no network interfaces at all, then it would be nicer to use a name
that's at least been used with nfs at *some* point--just on the general
principle of not leaking information to a domain that the user wouldn't
expect it to.

(Assuming it's unlikely anyone would consider init's utsname to be
sensitive information, that's a minor point.)

--b.

2009-01-07 00:23:40

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 19:20 -0500, J. Bruce Fields wrote:
> If it would be possible, for example, for the 'init' namespace to have
> no network interfaces at all, then it would be nicer to use a name
> that's at least been used with nfs at *some* point--just on the general
> principle of not leaking information to a domain that the user wouldn't
> expect it to.

Then RPC would fail. Thanks to the limitations imposed by selinux &
friends, all RPC sockets have to be owned by the init process.
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-01-07 00:26:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, Jan 06, 2009 at 07:20:07PM -0500, Trond Myklebust wrote:
> As for the NFSv4 clientid, I can't see how you would ever want to use
> anything other than the init->utsname(), since the requirement is only
> that the clientid string be unique and preserved across reboots. The
> server isn't allowed to interpret the contents of the clientid string.
> Ditto for the RPCSEC_GSS machine creds that are used to establish the
> clientid.

If people eventually expect to be able to, say, migrate a container to
another host while using an nfs mount as their storage, then they'd need
the name to be that of the container, not of the host.

Obviously we'd also need to ensure the container got its own nfsv4
client state, etc., etc., and it sounds like we're far from that.

--b.

2009-01-07 00:38:53

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 19:26 -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 07:20:07PM -0500, Trond Myklebust wrote:
> > As for the NFSv4 clientid, I can't see how you would ever want to use
> > anything other than the init->utsname(), since the requirement is only
> > that the clientid string be unique and preserved across reboots. The
> > server isn't allowed to interpret the contents of the clientid string.
> > Ditto for the RPCSEC_GSS machine creds that are used to establish the
> > clientid.
>
> If people eventually expect to be able to, say, migrate a container to
> another host while using an nfs mount as their storage, then they'd need
> the name to be that of the container, not of the host.

Why?

> Obviously we'd also need to ensure the container got its own nfsv4
> client state, etc., etc., and it sounds like we're far from that.

Again, why? Are you seriously proposing a plan to transport all NFS and
locking state directly from one kernel to another?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-01-07 00:43:36

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 19:20 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote:
> > IMHO This seems more incorrect than trying to use a more proximal
> > namespace.
>
> You have yet to explain why.

It's "more proximal" -- it's closer to the container that we expect to
cause (directly or otherwise) the bulk of the RPC calls for that mount.
If the container does not wind up sharing that mount with other
containers then the reported node name matches. If the container winds
up sharing the mount with other containers then at least we can learn
which container originated the mount.

I imagine an NFS administrator trying to determine the source of a bunch
of RPC calls. If we just report the initial namespace then that
administrator has to do lots more digging to determine which container
sent the calls (assuming they aren't in different network namespaces).
By not always reporting the initial namespace we may give the
administrator one way to narrow down the search. Even if the reported
node name does not perfectly match the source of all RPC traffic related
to the mount at least the administrator gets something more specific.

Cheers,
-Matt Helsley

2009-01-07 00:46:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote:
> IMHO This seems more incorrect than trying to use a more proximal
namespace.

You have yet to explain why.



2009-01-07 00:57:44

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 19:39 -0500, [email protected] wrote:
> On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote:
> > IMHO This seems more incorrect than trying to use a more proximal
> namespace.
>
> You have yet to explain why.

Yes, I know I have to explain why. If you would please have a little
more patience you might see I have replied to your original request for
an explanation -- I *just* sent it out. If I don't see it show up soon
I'll resend it as a reply here. OK?

Cheers,
-Matt Helsley

2009-01-07 00:58:26

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

Matt Helsley <[email protected]> writes:

> On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote:
>> On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote:
>> > On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
>> > > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
>> > > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
>> > > > > That implies to me you want to capture the value at mount time, and to
>> > > > > pass it in to the rpc_call creation, and only at very specific well
>> > > > > defined points where we interact with user space should we examine
>> > > > > current->utsname(). At which point there should be no question
>> > > > > of current->utsname() is valid as the user space process is alive.
>> > > >
>> > > > Why pretend that the filesystem is owned by a particular namespace? It
>> > > > can, and will be shared among many containers...

Sounds right. Still like the owner of a file it can happen that some
containers are more correct than others. Especially in the context of
mount merging and the other sophisticated caching that happens in NFS
this increasingly sounds like something that belongs in the cred as
that is where it is used.

>> > > If the only purpose of this is to fill in the auth_unix cred then
>> > > shouldn't it be part of whatever cred structures are passed around?
>> >
>> > So how does tracking it in a shared structure like the rpc_client help?
>> > If you consider it to be part of the cred, then it needs to be tracked
>> > in the cred...
>>
>> Right, that's what I meant.
>>
>> It seems like overkill, though. Does anyone actually care whether these
>> names are right?
>
> That's certainly a tempting angle. However we may not "control" the
> server code -- couldn't there be some oddball (maybe even proprietary)
> NFS servers out there that users do care about interacting with?

Matt could you look at what it will take to do the right thing from
the network namespace side of things as well? I believe it is going
to require the same level of understanding of the interactions in the code
to get there.

For the network namespace we should cache it at mount or server
startup and use it until we are done. In a network namespace context
there are good reasons for that because talking to 10.0.0.1 on one
network may not be the same machine as talking to 10.0.0.1 on another
network. NFS reestablishes tcp connections if the connection to the
server breaks doesn't it? Or is that left to user space?

Eric

2009-01-07 01:03:25

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 16:57 -0800, Matt Helsley wrote:
> On Tue, 2009-01-06 at 19:39 -0500, [email protected] wrote:
> > On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote:
> > > IMHO This seems more incorrect than trying to use a more proximal
> > namespace.
> >
> > You have yet to explain why.
>
> Yes, I know I have to explain why. If you would please have a little
> more patience you might see I have replied to your original request for
> an explanation -- I *just* sent it out. If I don't see it show up soon
> I'll resend it as a reply here. OK?

Sorry if you saw the request twice. My mail server at uio.no appears to
be on the blink, so I switched to the netapp account...

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-01-07 01:10:27

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 16:43 -0800, Matt Helsley wrote:
> On Tue, 2009-01-06 at 19:20 -0500, Trond Myklebust wrote:
> > On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote:
> > > IMHO This seems more incorrect than trying to use a more proximal
> > > namespace.
> >
> > You have yet to explain why.
>
> It's "more proximal" -- it's closer to the container that we expect to
> cause (directly or otherwise) the bulk of the RPC calls for that mount.
> If the container does not wind up sharing that mount with other
> containers then the reported node name matches. If the container winds
> up sharing the mount with other containers then at least we can learn
> which container originated the mount.
>
> I imagine an NFS administrator trying to determine the source of a bunch
> of RPC calls. If we just report the initial namespace then that
> administrator has to do lots more digging to determine which container
> sent the calls (assuming they aren't in different network namespaces).

As I said elsewhere, the network namespaces do not matter, since we
always create the sockets via the rpciod workqueue.

> By not always reporting the initial namespace we may give the
> administrator one way to narrow down the search. Even if the reported
> node name does not perfectly match the source of all RPC traffic related
> to the mount at least the administrator gets something more specific.

The administrator would have to be running wireshark or something in
order to track the rpc calls. That's hardly a convenient interface, or a
compelling reason to add namespace info into RPC credentials. We
certainly haven't added any other information to the RPC calls in order
to allow the administrator to identify which process it originated from.

Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-01-07 01:23:16

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 20:02 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 16:57 -0800, Matt Helsley wrote:
> > On Tue, 2009-01-06 at 19:39 -0500, [email protected] wrote:
> > > On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote:
> > > > IMHO This seems more incorrect than trying to use a more proximal
> > > namespace.
> > >
> > > You have yet to explain why.
> >
> > Yes, I know I have to explain why. If you would please have a little
> > more patience you might see I have replied to your original request for
> > an explanation -- I *just* sent it out. If I don't see it show up soon
> > I'll resend it as a reply here. OK?
>
> Sorry if you saw the request twice. My mail server at uio.no appears to
> be on the blink, so I switched to the netapp account...

Ahhh, ok. Sorry I misinterpretted that..

Cheers,
-Matt Helsley

2009-01-07 01:44:52

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, Jan 06, 2009 at 07:38:26PM -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 19:26 -0500, J. Bruce Fields wrote:
> > On Tue, Jan 06, 2009 at 07:20:07PM -0500, Trond Myklebust wrote:
> > > As for the NFSv4 clientid, I can't see how you would ever want to use
> > > anything other than the init->utsname(), since the requirement is only
> > > that the clientid string be unique and preserved across reboots. The
> > > server isn't allowed to interpret the contents of the clientid string.
> > > Ditto for the RPCSEC_GSS machine creds that are used to establish the
> > > clientid.
> >
> > If people eventually expect to be able to, say, migrate a container to
> > another host while using an nfs mount as their storage, then they'd need
> > the name to be that of the container, not of the host.
>
> Why?
>
> > Obviously we'd also need to ensure the container got its own nfsv4
> > client state, etc., etc., and it sounds like we're far from that.
>
> Again, why? Are you seriously proposing a plan to transport all NFS and
> locking state directly from one kernel to another?

If people seem to think they can do live process and container
migration:

http://ols.fedoraproject.org/OLS/Reprints-2008/mirkin-reprint.pdf

then moving the NFS state strikes me as not the greatest of their
troubles.

I'm not volunteering!

--b.

2009-01-07 01:50:56

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 20:44 -0500, J. Bruce Fields wrote:
> > Again, why? Are you seriously proposing a plan to transport all NFS and
> > locking state directly from one kernel to another?
>
> If people seem to think they can do live process and container
> migration:
>
> http://ols.fedoraproject.org/OLS/Reprints-2008/mirkin-reprint.pdf
>
> then moving the NFS state strikes me as not the greatest of their
> troubles.

ROTFL...


--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-01-07 02:47:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

"J. Bruce Fields" <[email protected]> writes:

> If people seem to think they can do live process and container
> migration:
>
> http://ols.fedoraproject.org/OLS/Reprints-2008/mirkin-reprint.pdf
>
> then moving the NFS state strikes me as not the greatest of their
> troubles.
>
> I'm not volunteering!

Reasonable. There are a lot easier cases than NFS, and migration is
not the biggest use case.

A lot of this work we can do incrementally and simply not support
doing everything in a container. Which is where we are today.

The whole hostname in NFS packets isn't something that makes much sense
to me in the first place, so I can't comment on a good use case for that.

Sorting out the details so we can get NFS working in multiple network namespaces
makes a lot of sense to me. It allows things like VPNing into work isolating the
VPN process in a network namespace, and still being able to use all of the kernel
networking including NFS like normal sounds fun to me.

There may also be some fun things you can do with testing having both an NFS
server and the client on the same box and not go through the loopback device.

Eric

2009-01-07 03:45:05

by Matt Helsley

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces

On Tue, 2009-01-06 at 19:23 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 19:20 -0500, J. Bruce Fields wrote:
> > If it would be possible, for example, for the 'init' namespace to have
> > no network interfaces at all, then it would be nicer to use a name
> > that's at least been used with nfs at *some* point--just on the general
> > principle of not leaking information to a domain that the user wouldn't
> > expect it to.
>
> Then RPC would fail. Thanks to the limitations imposed by selinux &
> friends, all RPC sockets have to be owned by the init process.

Interesting -- I'm not familiar with this requirement of selinux. Must
it be the init process of the initial pid namespace or could any pid
namespace's init process own it?

Cheers,
-Matt Helsley