Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail-ie0-f171.google.com ([209.85.223.171]:47950 "EHLO mail-ie0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754436AbbA3Xt0 (ORCPT ); Fri, 30 Jan 2015 18:49:26 -0500 Received: by mail-ie0-f171.google.com with SMTP id tr6so6962740ieb.2 for ; Fri, 30 Jan 2015 15:49:25 -0800 (PST) Message-ID: <1422661761.21259.5.camel@primarydata.com> Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup From: Trond Myklebust To: Bruno =?ISO-8859-1?Q?Pr=E9mont?= , Nix Cc: "J. Bruce Fields" , "Eric W. Biederman" , Linux Kernel Mailing List , Linux NFS Mailing List Date: Fri, 30 Jan 2015 18:49:21 -0500 In-Reply-To: References: <20150125220604.090121ae@neptune.home> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Sun, 2015-01-25 at 16:55 -0500, Trond Myklebust wrote: > On Sun, Jan 25, 2015 at 4:06 PM, Bruno Prémont > wrote: > > On a system running home-brown container (mntns, utsns, pidns, netns) > > with NFS mount-point bind-mounted into the container I hit the following > > trace if nfs filesystem is first umount()ed in init ns and then later > > umounted from container when the container exists. > > > > [51397.767310] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 > > [51397.770671] IP: [] rpc_new_client+0x193/0x2b0 > > [51397.773967] PGD 0 > > [51397.777218] Oops: 0000 [#1] SMP > > [51397.780490] Modules linked in: > > [51397.783751] CPU: 0 PID: 1711 Comm: image-starter Not tainted 3.19.0-rc2-kvm+ #7 > > [51397.787123] Hardware name: Gigabyte Technology Co., Ltd. GA-A75M-UD2H/GA-A75M-UD2H, BIOS F6 09/28/2012 > > [51397.790606] task: ffff8800c9fcbac0 ti: ffff8801fe754000 task.ti: ffff8801fe754000 > > [51397.794149] RIP: 0010:[] [] rpc_new_client+0x193/0x2b0 > > [51397.797798] RSP: 0018:ffff8801fe757908 EFLAGS: 00010246 > > [51397.801444] RAX: 0000000000000000 RBX: ffff88009dafb240 RCX: 0000000000000180 > > [51397.805174] RDX: 000000000000bae0 RSI: 0000000000001770 RDI: ffff88009dafb308 > > [51397.808913] RBP: ffff8801fe757948 R08: ffff88009daf92d8 R09: ffff88009dafb458 > > [51397.812673] R10: ffff88009dafb458 R11: ffff88020ec15bc0 R12: ffff8801fe757a40 > > [51397.816456] R13: ffffffff81b9d800 R14: ffff8800c6e31030 R15: 0000000000000000 > > [51397.820270] FS: 00007f335a3a1700(0000) GS:ffff88020ec00000(0000) knlGS:00000000f7287700 > > [51397.824168] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [51397.828066] CR2: 0000000000000008 CR3: 00000001fc54d000 CR4: 00000000000007f0 > > [51397.832017] Stack: > > [51397.835924] 000000000000000a ffffffff81b9d770 ffffffff81826450 ffff8801fe757a40 > > [51397.840023] ffff8801fe757a40 ffff8800cf08d500 ffffffff81826450 ffffffff820f4728 > > [51397.844130] ffff8801fe757978 ffffffff81828815 ffff8801fe757978 ffffffff8182aad8 > > [51397.848224] Call Trace: > > [51397.852221] [] ? call_start+0x20/0x20 > > [51397.856273] [] ? call_start+0x20/0x20 > > [51397.860295] [] rpc_create_xprt+0x15/0xb0 > > [51397.864324] [] ? xprt_create_transport+0x108/0x1b0 > > [51397.868428] [] rpc_create+0xc1/0x190 > > [51397.872574] [] ? internal_add_timer+0x66/0x80 > > [51397.876733] [] ? mod_timer+0x109/0x1e0 > > [51397.880877] [] rpcb_create+0x6e/0x90 > > [51397.884999] [] rpcb_getport_async+0x15a/0x330 > > [51397.889118] [] ? rpc_malloc+0x3a/0x70 > > [51397.893240] [] ? __kmalloc+0xc2/0x170 > > [51397.897354] [] ? call_reserveresult+0x110/0x110 > > [51397.901490] [] ? call_start+0x20/0x20 > > [51397.905606] [] ? call_start+0x20/0x20 > > [51397.909662] [] call_bind+0x3e/0x40 > > [51397.913709] [] __rpc_execute+0x79/0x330 > > [51397.917778] [] rpc_execute+0x5d/0xa0 > > [51397.921871] [] rpc_run_task+0x6b/0x90 > > [51397.925989] [] rpc_call_sync+0x3e/0xa0 > > [51397.930108] [] nsm_mon_unmon+0xb9/0xd0 > > [51397.934191] [] ? call_rcu_bh+0x20/0x20 > > [51397.938235] [] nsm_unmonitor+0x8c/0x140 > > [51397.942309] [] nlm_destroy_host_locked+0x63/0xa0 > > [51397.946442] [] nlmclnt_release_host+0x7c/0x130 > > [51397.950591] [] nlmclnt_done+0x15/0x30 > > [51397.954773] [] nfs_destroy_server+0x12/0x20 > > [51397.958934] [] nfs_free_server+0x22/0xa0 > > [51397.963053] [] nfs_kill_super+0x1d/0x30 > > [51397.967158] [] deactivate_locked_super+0x4c/0x70 > > [51397.971286] [] deactivate_super+0x49/0x70 > > [51397.975398] [] cleanup_mnt+0x3e/0x90 > > [51397.979499] [] __cleanup_mnt+0xd/0x10 > > [51397.983598] [] task_work_run+0xbc/0xe0 > > [51397.987697] [] do_exit+0x295/0xaf0 > > [51397.991812] [] ? ____fput+0x9/0x10 > > [51397.995937] [] ? task_work_run+0xa4/0xe0 > > [51398.000070] [] do_group_exit+0x3a/0xa0 > > [51398.004201] [] SyS_exit_group+0xf/0x10 > > [51398.008315] [] system_call_fastpath+0x12/0x17 > > [51398.012438] Code: 43 78 48 8d bb c8 00 00 00 48 89 7b 70 48 8b 30 e8 63 2d 01 00 c7 03 01 00 00 00 65 48 8b 04 25 00 aa 00 00 48 8b 80 c0 09 00 00 <4c> 8b 68 08 49 83 c5 45 4 > > [51398.022378] RIP [] rpc_new_client+0x193/0x2b0 > > [51398.026732] RSP > > [51398.031025] CR2: 0000000000000008 > > [51398.035326] ---[ end trace b701b037bc457620 ]--- > > [51398.058223] Fixing recursive fault but reboot is needed! > > We should rather change rpcb_create() to pass the nodename from the > parent. The point is that the rpc_clnt->cl_nodename is used in various > different contexts (for instance in the AUTH_SYS credential) where it > isn't always appropriate to have it set to an empty string. I was rather hoping that Bruno would fix up his patch and resend, but since other reports of the same bug are now surfacing... Please could you all check if something like the following patch fixes it. Thanks Trond 8<--------------------------------------------------------------------- >From 87557df0ca2241da0edac558286650fdb081152c Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 30 Jan 2015 18:12:28 -0500 Subject: [PATCH] SUNRPC: NULL utsname dereference on NFS umount during namespace cleanup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix an Oopsable condition when nsm_mon_unmon is called as part of the namespace cleanup, which now apparently happens after the utsname has been freed. Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home Reported-by: Bruno Prémont Cc: stable@vger.kernel.org # 3.18 Signed-off-by: Trond Myklebust --- fs/lockd/mon.c | 13 +++++++++---- include/linux/sunrpc/clnt.h | 3 ++- net/sunrpc/clnt.c | 12 +++++++----- net/sunrpc/rpcb_clnt.c | 8 ++++++-- 4 files changed, 24 insertions(+), 12 deletions(-) diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c index 1cc6ec51e6b1..47a32b6d9b90 100644 --- a/fs/lockd/mon.c +++ b/fs/lockd/mon.c @@ -65,7 +65,7 @@ static inline struct sockaddr *nsm_addr(const struct nsm_handle *nsm) return (struct sockaddr *)&nsm->sm_addr; } -static struct rpc_clnt *nsm_create(struct net *net) +static struct rpc_clnt *nsm_create(struct net *net, const char *nodename) { struct sockaddr_in sin = { .sin_family = AF_INET, @@ -77,6 +77,7 @@ static struct rpc_clnt *nsm_create(struct net *net) .address = (struct sockaddr *)&sin, .addrsize = sizeof(sin), .servername = "rpc.statd", + .nodename = nodename, .program = &nsm_program, .version = NSM_VERSION, .authflavor = RPC_AUTH_NULL, @@ -102,7 +103,7 @@ out: return clnt; } -static struct rpc_clnt *nsm_client_get(struct net *net) +static struct rpc_clnt *nsm_client_get(struct net *net, const char *nodename) { struct rpc_clnt *clnt, *new; struct lockd_net *ln = net_generic(net, lockd_net_id); @@ -111,7 +112,7 @@ static struct rpc_clnt *nsm_client_get(struct net *net) if (clnt != NULL) goto out; - clnt = new = nsm_create(net); + clnt = new = nsm_create(net, nodename); if (IS_ERR(clnt)) goto out; @@ -190,19 +191,23 @@ int nsm_monitor(const struct nlm_host *host) struct nsm_res res; int status; struct rpc_clnt *clnt; + const char *nodename = NULL; dprintk("lockd: nsm_monitor(%s)\n", nsm->sm_name); if (nsm->sm_monitored) return 0; + if (host->h_rpcclnt) + nodename = host->h_rpcclnt->cl_nodename; + /* * Choose whether to record the caller_name or IP address of * this peer in the local rpc.statd's database. */ nsm->sm_mon_name = nsm_use_hostnames ? nsm->sm_name : nsm->sm_addrbuf; - clnt = nsm_client_get(host->net); + clnt = nsm_client_get(host->net, nodename); if (IS_ERR(clnt)) { status = PTR_ERR(clnt); dprintk("lockd: failed to create NSM upcall transport, " diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index d86acc63b25f..598ba80ec30c 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -57,7 +57,7 @@ struct rpc_clnt { const struct rpc_timeout *cl_timeout; /* Timeout strategy */ int cl_nodelen; /* nodename length */ - char cl_nodename[UNX_MAXNODENAME]; + char cl_nodename[UNX_MAXNODENAME+1]; struct rpc_pipe_dir_head cl_pipedir_objects; struct rpc_clnt * cl_parent; /* Points to parent of clones */ struct rpc_rtt cl_rtt_default; @@ -112,6 +112,7 @@ struct rpc_create_args { struct sockaddr *saddress; const struct rpc_timeout *timeout; const char *servername; + const char *nodename; const struct rpc_program *program; u32 prognumber; /* overrides program->number */ u32 version; diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 05da12a33945..3f5d4d48f0cb 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -286,10 +286,8 @@ static struct rpc_xprt *rpc_clnt_set_transport(struct rpc_clnt *clnt, static void rpc_clnt_set_nodename(struct rpc_clnt *clnt, const char *nodename) { - clnt->cl_nodelen = strlen(nodename); - if (clnt->cl_nodelen > UNX_MAXNODENAME) - clnt->cl_nodelen = UNX_MAXNODENAME; - memcpy(clnt->cl_nodename, nodename, clnt->cl_nodelen); + clnt->cl_nodelen = strlcpy(clnt->cl_nodename, + nodename, sizeof(clnt->cl_nodename)); } static int rpc_client_register(struct rpc_clnt *clnt, @@ -365,6 +363,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, const struct rpc_version *version; struct rpc_clnt *clnt = NULL; const struct rpc_timeout *timeout; + const char *nodename = args->nodename; int err; /* sanity check the name before trying to print it */ @@ -420,8 +419,10 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, atomic_set(&clnt->cl_count, 1); + if (nodename == NULL) + nodename = utsname()->nodename; /* save the nodename */ - rpc_clnt_set_nodename(clnt, utsname()->nodename); + rpc_clnt_set_nodename(clnt, nodename); err = rpc_client_register(clnt, args->authflavor, args->client_name); if (err) @@ -576,6 +577,7 @@ static struct rpc_clnt *__rpc_clone_client(struct rpc_create_args *args, if (xprt == NULL) goto out_err; args->servername = xprt->servername; + args->nodename = clnt->cl_nodename; new = rpc_new_client(args, xprt, clnt); if (IS_ERR(new)) { diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c index 05202012bcfc..cf5770d8f49a 100644 --- a/net/sunrpc/rpcb_clnt.c +++ b/net/sunrpc/rpcb_clnt.c @@ -355,7 +355,8 @@ out: return result; } -static struct rpc_clnt *rpcb_create(struct net *net, const char *hostname, +static struct rpc_clnt *rpcb_create(struct net *net, const char *nodename, + const char *hostname, struct sockaddr *srvaddr, size_t salen, int proto, u32 version) { @@ -365,6 +366,7 @@ static struct rpc_clnt *rpcb_create(struct net *net, const char *hostname, .address = srvaddr, .addrsize = salen, .servername = hostname, + .nodename = nodename, .program = &rpcb_program, .version = version, .authflavor = RPC_AUTH_UNIX, @@ -740,7 +742,9 @@ void rpcb_getport_async(struct rpc_task *task) dprintk("RPC: %5u %s: trying rpcbind version %u\n", task->tk_pid, __func__, bind_version); - rpcb_clnt = rpcb_create(xprt->xprt_net, xprt->servername, sap, salen, + rpcb_clnt = rpcb_create(xprt->xprt_net, + clnt->cl_nodename, + xprt->servername, sap, salen, xprt->prot, bind_version); if (IS_ERR(rpcb_clnt)) { status = PTR_ERR(rpcb_clnt); -- 2.1.0