From: Matt Helsley Subject: Re: [RFC][PATCH 3/4] sunrpc: Improve UTS namespace workaround Date: Tue, 06 Jan 2009 16:28:03 -0800 Message-ID: <1231288083.14345.215.camel@localhost> References: <20090106011314.534653345@us.ibm.com> <20090106011315.100081168@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain Cc: Linux Containers , "J. Bruce Fields" , Cedric Le Goater , Linux Kernel Mailing List , linux-nfs@vger.kernel.org, Trond Myklebust , "Eric W. Biederman" , Linux Containers To: Chuck Lever Return-path: Received: from e32.co.us.ibm.com ([32.97.110.150]:36715 "EHLO e32.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751280AbZAGA2I (ORCPT ); Tue, 6 Jan 2009 19:28:08 -0500 In-Reply-To: Sender: linux-nfs-owner@vger.kernel.org List-ID: On Tue, 2009-01-06 at 11:02 -0500, Chuck Lever wrote: > Matt- > > Thanks for pursuing a permanent fix for this. > > On Jan 5, 2009, at Jan 5, 2009, 8:13 PM, Matt Helsley wrote: > > > We can improve upon a workaround applied in commit > > 63ffc23d307c9534c732edd87895e37b223004a3 ( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=63ffc23d307c9534c732edd87895e37b223004a3 > > ) > > > > The original problem was: > > > > "On a system with nfs mounts, if a task unshares its mount namespace, > > a oops can occur when the system is rebooted if the task is the last > > to unreference the nfs mount. It will try to create a rpc request > > using utsname() which has been invalidated by free_nsproxy()." > > > > Cedric worked around this by always using the initial uts namespace > > for RPC. > > Critically this workaround meant that RPC clients in uts namespaces > > can never > > report the changed nodename when utilizing RPC. > > > > Fix that by storing the nodename in the NFS server structure (part > > of the NFS > > super block) and, when an RPC client is operating on behalf of NFS, > > reporting > > that nodename. This solves the problem for NFS clients but leaves > > any other > > RPC users out in the cold. > > > > Rather than caching the nodename in the client structure RPC should > > obtain the > > nodename from RPC callers. It would then be up to those services > > making RPC > > calls to cache the nodename for as long as necessary -- somewhat > > like this patch > > does with NFS. > > Instead of having the RPC client call the consumer back, why can't you > pass the nodename as an argument to each RPC call; say, via the > rpc_message structure? > > > NOTE: Part of Cedric's workaround -- use of the initial uts > > namespace -- is > > still necessary because non-NFS RPC callers still rely on the > > nodename cached > > with the RPC client struct. > > In the long run I think it would be more useful to spell out where > each consumer gets its nodename value, rather than having a convenient > default value. A default would encourage exposing nodenames > inappropriately due to sloppy coding and incorrect assumptions (on the > developer's part) about a complex API. > > Where would NFSv4 callbacks get a nodename, for example? And what > about rpcbind requests? > > > Thoughts? > > > More comments below. > > > Signed-off-by: Matt Helsley > > Cc: Cedric Le Goater > > Cc: Linux Kernel Mailing List > > Cc: linux-nfs@vger.kernel.org > > Cc: Trond Myklebust > > Cc: Chuck Lever > > Cc: Eric W. Biederman > > Cc: Linux Containers > > > > --- > > fs/lockd/clntproc.c | 16 +++++++++++++--- > > fs/nfs/client.c | 6 ++++++ > > fs/nfs/super.c | 6 ++++++ > > include/linux/nfs_fs.h | 2 ++ > > include/linux/nfs_fs_sb.h | 3 +++ > > include/linux/sunrpc/clnt.h | 2 ++ > > net/sunrpc/auth_unix.c | 13 ++++++++++++- > > 7 files changed, 44 insertions(+), 4 deletions(-) > > > > Index: linux-2.6.28/include/linux/sunrpc/clnt.h > > =================================================================== > > --- linux-2.6.28.orig/include/linux/sunrpc/clnt.h > > +++ linux-2.6.28/include/linux/sunrpc/clnt.h > > @@ -19,6 +19,7 @@ > > #include > > > > struct rpc_inode; > > +struct nfs_server; > > > > /* > > * The high-level client handle > > @@ -50,6 +51,7 @@ struct rpc_clnt { > > > > int cl_nodelen; /* nodename length */ > > char cl_nodename[UNX_MAXNODENAME]; > > + struct nfs_server *cl_nfs_server; > > This is a layering violation... I would rather avoid introducing new > strong data structure dependencies on one of RPC's consumers. Good point. I was hoping the RPC caller structure I introduced in patch 4 would be, if not pretty, at least more-acceptable replacement for this hack. However it seems this series has more fundamental issues that need to be answered before I can proceed.. > > char cl_pathname[30];/* Path in rpc_pipe_fs */ > > struct vfsmount * cl_vfsmnt; > > struct dentry * cl_dentry; /* inode */ > > Index: linux-2.6.28/net/sunrpc/auth_unix.c > > =================================================================== > > --- linux-2.6.28.orig/net/sunrpc/auth_unix.c > > +++ linux-2.6.28/net/sunrpc/auth_unix.c > > @@ -12,6 +12,13 @@ > > #include > > #include > > > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > #define NFS_NGROUPS 16 > > > > struct unx_cred { > > @@ -151,7 +158,11 @@ unx_marshal(struct rpc_task *task, __be3 > > /* > > * Copy the UTS nodename captured when the client was created. > > */ > > - p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen); > > + if (clnt->cl_nfs_server) > > + p = xdr_encode_array(p, clnt->cl_nfs_server->nodename, > > + clnt->cl_nfs_server->nodelen); > > + else > > + p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen); > > > > *p++ = htonl((u32) cred->uc_uid); > > *p++ = htonl((u32) cred->uc_gid); > > Index: linux-2.6.28/include/linux/nfs_fs_sb.h > > =================================================================== > > --- linux-2.6.28.orig/include/linux/nfs_fs_sb.h > > +++ linux-2.6.28/include/linux/nfs_fs_sb.h > > @@ -126,6 +126,9 @@ struct nfs_server { > > u32 mountd_version; > > unsigned short mountd_port; > > unsigned short mountd_protocol; > > + > > + int nodelen; /* nodename length */ > > + char nodename[UNX_MAXNODENAME]; > > }; > > > > /* Server capabilities */ > > Index: linux-2.6.28/fs/nfs/super.c > > =================================================================== > > --- linux-2.6.28.orig/fs/nfs/super.c > > +++ linux-2.6.28/fs/nfs/super.c > > @@ -51,6 +51,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -1830,6 +1831,11 @@ static void nfs_fill_super(struct super_ > > sb->s_time_gran = 1; > > } > > > > + server->nodelen = strlen(utsname()->nodename); > > + if (server->nodelen > UNX_MAXNODENAME) > > + server->nodelen = UNX_MAXNODENAME; > > + memcpy(server->nodename, utsname()->nodename, server->nodelen); > > + > > sb->s_op = &nfs_sops; > > nfs_initialise_sb(sb); > > } > > Index: linux-2.6.28/fs/nfs/client.c > > =================================================================== > > --- linux-2.6.28.orig/fs/nfs/client.c > > +++ linux-2.6.28/fs/nfs/client.c > > @@ -25,10 +25,12 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -47,6 +49,7 @@ > > #include "internal.h" > > > > #define NFSDBG_FACILITY NFSDBG_CLIENT > > +#define NLMDBG_FACILITY NLMDBG_CLIENT > > > > static DEFINE_SPINLOCK(nfs_client_lock); > > static LIST_HEAD(nfs_client_list); > > @@ -555,6 +558,7 @@ static void nfs_init_server_aclclient(st > > > > /* No errors! Assume that Sun nfsacls are supported */ > > server->caps |= NFS_CAP_ACLS; > > + server->client_acl->cl_nfs_server = server; > > return; > > > > out_noacl: > > @@ -673,6 +677,7 @@ static int nfs_init_server(struct nfs_se > > goto error; > > > > server->nfs_client = clp; > > + clp->cl_rpcclient->cl_nfs_server = server; > > > > /* Initialise the client representation from the mount data */ > > server->flags = data->flags; > > @@ -1035,6 +1040,7 @@ static int nfs4_set_client(struct nfs_se > > goto error_put; > > > > server->nfs_client = clp; > > + clp->cl_rpcclient->cl_nfs_server = server; > > dprintk("<-- nfs4_set_client() = 0 [new %p]\n", clp); > > return 0; > > > > Index: linux-2.6.28/fs/lockd/clntproc.c > > =================================================================== > > --- linux-2.6.28.orig/fs/lockd/clntproc.c > > +++ linux-2.6.28/fs/lockd/clntproc.c > > @@ -10,8 +10,8 @@ > > #include > > #include > > #include > > +#include > > #include > > -#include > > #include > > #include > > #include > > @@ -118,6 +118,11 @@ static struct nlm_lockowner *nlm_find_lo > > return res; > > } > > > > +struct nfs_server *fl_nfs_server(struct file_lock *fl) > > +{ > > + return NFS_SB(fl->fl_file->f_path.mnt->mnt_sb); > > +} > > + > > /* > > * Initialize arguments for TEST/LOCK/UNLOCK/CANCEL calls > > */ > > @@ -125,15 +130,18 @@ static void nlmclnt_setlockargs(struct n > > { > > struct nlm_args *argp = &req->a_args; > > struct nlm_lock *lock = &argp->lock; > > + char *nodename; > > > > nlmclnt_next_cookie(&argp->cookie); > > argp->state = nsm_local_state; > > memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode), > > sizeof(struct nfs_fh)); > > - lock->caller = utsname()->nodename; > > + > > + nodename = fl_nfs_server(fl)->nodename; > > + lock->caller = nodename; > > lock->oh.data = req->a_owner; > > lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s", > > (unsigned int)fl->fl_u.nfs_fl.owner->pid, > > - utsname()->nodename); > > + nodename); > > lock->svid = fl->fl_u.nfs_fl.owner->pid; > > lock->fl.fl_start = fl->fl_start; > > lock->fl.fl_end = fl->fl_end; > > @@ -272,6 +280,7 @@ nlmclnt_call(struct rpc_cred *cred, stru > > /* If we have no RPC client yet, create one. */ > > if ((clnt = nlm_bind_host(host)) == NULL) > > return -ENOLCK; > > + clnt->cl_nfs_server = fl_nfs_server(&argp->lock.fl); > > msg.rpc_proc = &clnt->cl_procinfo[proc]; > > > > /* Perform the RPC call. If an error occurs, try again */ > > @@ -344,6 +353,7 @@ static struct rpc_task *__nlm_async_call > > clnt = nlm_bind_host(host); > > if (clnt == NULL) > > goto out_err; > > + clnt->cl_nfs_server = fl_nfs_server(&req->a_args.lock.fl); > > This change takes care of fixing NFS client locking requests, but the > NFS server also makes lockd callbacks to the client. You won't have > an nfs_server handle on the NFS server side. True -- this series doesn't address the server side. Depending on how this discussion goes I may end up visiting that code too. > Generally, you ought to pass just the nodename where needed, not a > pointer to the nfs_server. Passing the nfs server around is a temporary measure. It's meant to make the following patch more palatable because patch 4 removes the code passing around the nfs server pointer and adds the RPC caller structure in its place. Cheers, -Matt