2015-01-25 21:23:37

by Bruno Prémont

[permalink] [raw]
Subject: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On a system running home-brown container (mntns, utsns, pidns, netns)
with NFS mount-point bind-mounted into the container I hit the following
trace if nfs filesystem is first umount()ed in init ns and then later
umounted from container when the container exists.

[51397.767310] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[51397.770671] IP: [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
[51397.773967] PGD 0
[51397.777218] Oops: 0000 [#1] SMP
[51397.780490] Modules linked in:
[51397.783751] CPU: 0 PID: 1711 Comm: image-starter Not tainted 3.19.0-rc2-kvm+ #7
[51397.787123] Hardware name: Gigabyte Technology Co., Ltd. GA-A75M-UD2H/GA-A75M-UD2H, BIOS F6 09/28/2012
[51397.790606] task: ffff8800c9fcbac0 ti: ffff8801fe754000 task.ti: ffff8801fe754000
[51397.794149] RIP: 0010:[<ffffffff81828173>] [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
[51397.797798] RSP: 0018:ffff8801fe757908 EFLAGS: 00010246
[51397.801444] RAX: 0000000000000000 RBX: ffff88009dafb240 RCX: 0000000000000180
[51397.805174] RDX: 000000000000bae0 RSI: 0000000000001770 RDI: ffff88009dafb308
[51397.808913] RBP: ffff8801fe757948 R08: ffff88009daf92d8 R09: ffff88009dafb458
[51397.812673] R10: ffff88009dafb458 R11: ffff88020ec15bc0 R12: ffff8801fe757a40
[51397.816456] R13: ffffffff81b9d800 R14: ffff8800c6e31030 R15: 0000000000000000
[51397.820270] FS: 00007f335a3a1700(0000) GS:ffff88020ec00000(0000) knlGS:00000000f7287700
[51397.824168] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[51397.828066] CR2: 0000000000000008 CR3: 00000001fc54d000 CR4: 00000000000007f0
[51397.832017] Stack:
[51397.835924] 000000000000000a ffffffff81b9d770 ffffffff81826450 ffff8801fe757a40
[51397.840023] ffff8801fe757a40 ffff8800cf08d500 ffffffff81826450 ffffffff820f4728
[51397.844130] ffff8801fe757978 ffffffff81828815 ffff8801fe757978 ffffffff8182aad8
[51397.848224] Call Trace:
[51397.852221] [<ffffffff81826450>] ? call_start+0x20/0x20
[51397.856273] [<ffffffff81826450>] ? call_start+0x20/0x20
[51397.860295] [<ffffffff81828815>] rpc_create_xprt+0x15/0xb0
[51397.864324] [<ffffffff8182aad8>] ? xprt_create_transport+0x108/0x1b0
[51397.868428] [<ffffffff81828971>] rpc_create+0xc1/0x190
[51397.872574] [<ffffffff81111c86>] ? internal_add_timer+0x66/0x80
[51397.876733] [<ffffffff81113a99>] ? mod_timer+0x109/0x1e0
[51397.880877] [<ffffffff8183a19e>] rpcb_create+0x6e/0x90
[51397.884999] [<ffffffff8183a71a>] rpcb_getport_async+0x15a/0x330
[51397.889118] [<ffffffff8182f1da>] ? rpc_malloc+0x3a/0x70
[51397.893240] [<ffffffff811af8d2>] ? __kmalloc+0xc2/0x170
[51397.897354] [<ffffffff81826830>] ? call_reserveresult+0x110/0x110
[51397.901490] [<ffffffff81826450>] ? call_start+0x20/0x20
[51397.905606] [<ffffffff81826450>] ? call_start+0x20/0x20
[51397.909662] [<ffffffff8182648e>] call_bind+0x3e/0x40
[51397.913709] [<ffffffff8182fa99>] __rpc_execute+0x79/0x330
[51397.917778] [<ffffffff818327bd>] rpc_execute+0x5d/0xa0
[51397.921871] [<ffffffff818286cb>] rpc_run_task+0x6b/0x90
[51397.925989] [<ffffffff8182872e>] rpc_call_sync+0x3e/0xa0
[51397.930108] [<ffffffff8127fe29>] nsm_mon_unmon+0xb9/0xd0
[51397.934191] [<ffffffff8110e2a0>] ? call_rcu_bh+0x20/0x20
[51397.938235] [<ffffffff8128018c>] nsm_unmonitor+0x8c/0x140
[51397.942309] [<ffffffff8127bc43>] nlm_destroy_host_locked+0x63/0xa0
[51397.946442] [<ffffffff8127c03c>] nlmclnt_release_host+0x7c/0x130
[51397.950591] [<ffffffff81279645>] nlmclnt_done+0x15/0x30
[51397.954773] [<ffffffff81241862>] nfs_destroy_server+0x12/0x20
[51397.958934] [<ffffffff81242372>] nfs_free_server+0x22/0xa0
[51397.963053] [<ffffffff8124cadd>] nfs_kill_super+0x1d/0x30
[51397.967158] [<ffffffff811c2e2c>] deactivate_locked_super+0x4c/0x70
[51397.971286] [<ffffffff811c33f9>] deactivate_super+0x49/0x70
[51397.975398] [<ffffffff811ddafe>] cleanup_mnt+0x3e/0x90
[51397.979499] [<ffffffff811ddb9d>] __cleanup_mnt+0xd/0x10
[51397.983598] [<ffffffff810e04cc>] task_work_run+0xbc/0xe0
[51397.987697] [<ffffffff810c8f95>] do_exit+0x295/0xaf0
[51397.991812] [<ffffffff811c2239>] ? ____fput+0x9/0x10
[51397.995937] [<ffffffff810e04b4>] ? task_work_run+0xa4/0xe0
[51398.000070] [<ffffffff810c986a>] do_group_exit+0x3a/0xa0
[51398.004201] [<ffffffff810c98df>] SyS_exit_group+0xf/0x10
[51398.008315] [<ffffffff8185e8d2>] system_call_fastpath+0x12/0x17
[51398.012438] Code: 43 78 48 8d bb c8 00 00 00 48 89 7b 70 48 8b 30 e8 63 2d 01 00 c7 03 01 00 00 00 65 48 8b 04 25 00 aa 00 00 48 8b 80 c0 09 00 00 <4c> 8b 68 08 49 83 c5 45 4
[51398.022378] RIP [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
[51398.026732] RSP <ffff8801fe757908>
[51398.031025] CR2: 0000000000000008
[51398.035326] ---[ end trace b701b037bc457620 ]---
[51398.058223] Fixing recursive fault but reboot is needed!


The code executed in rpc_new_client() tries to dereference the
struct new_utsname * returned by utsname() which has already been
released at this time.

Cc: <[email protected]> # 3.18
Signed-off-by: Bruno Prémont <[email protected]>
---
I've seen this trace on 3.18.x as well as 3.19-rc.


This patch fixes the NULL dereference but I'm not sure this is the
right fix.
Should init uts_ns be referred to or what is the impact of using a random
string to satisfy rpc_clnt_set_nodename()?

Thanks,
Bruno
---
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 05da12a..0246e94 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -365,6 +365,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,
const struct rpc_version *version;
struct rpc_clnt *clnt = NULL;
const struct rpc_timeout *timeout;
+ const struct new_utsname *uts_name;
int err;

/* sanity check the name before trying to print it */
@@ -421,7 +422,8 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,
atomic_set(&clnt->cl_count, 1);

/* save the nodename */
- rpc_clnt_set_nodename(clnt, utsname()->nodename);
+ uts_name = utsname();
+ rpc_clnt_set_nodename(clnt, uts_name ? uts_name->nodename : "");

err = rpc_client_register(clnt, args->authflavor, args->client_name);
if (err)


2015-01-25 21:55:22

by Trond Myklebust

[permalink] [raw]
Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On Sun, Jan 25, 2015 at 4:06 PM, Bruno Prémont
<[email protected]> wrote:
> On a system running home-brown container (mntns, utsns, pidns, netns)
> with NFS mount-point bind-mounted into the container I hit the following
> trace if nfs filesystem is first umount()ed in init ns and then later
> umounted from container when the container exists.
>
> [51397.767310] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [51397.770671] IP: [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> [51397.773967] PGD 0
> [51397.777218] Oops: 0000 [#1] SMP
> [51397.780490] Modules linked in:
> [51397.783751] CPU: 0 PID: 1711 Comm: image-starter Not tainted 3.19.0-rc2-kvm+ #7
> [51397.787123] Hardware name: Gigabyte Technology Co., Ltd. GA-A75M-UD2H/GA-A75M-UD2H, BIOS F6 09/28/2012
> [51397.790606] task: ffff8800c9fcbac0 ti: ffff8801fe754000 task.ti: ffff8801fe754000
> [51397.794149] RIP: 0010:[<ffffffff81828173>] [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> [51397.797798] RSP: 0018:ffff8801fe757908 EFLAGS: 00010246
> [51397.801444] RAX: 0000000000000000 RBX: ffff88009dafb240 RCX: 0000000000000180
> [51397.805174] RDX: 000000000000bae0 RSI: 0000000000001770 RDI: ffff88009dafb308
> [51397.808913] RBP: ffff8801fe757948 R08: ffff88009daf92d8 R09: ffff88009dafb458
> [51397.812673] R10: ffff88009dafb458 R11: ffff88020ec15bc0 R12: ffff8801fe757a40
> [51397.816456] R13: ffffffff81b9d800 R14: ffff8800c6e31030 R15: 0000000000000000
> [51397.820270] FS: 00007f335a3a1700(0000) GS:ffff88020ec00000(0000) knlGS:00000000f7287700
> [51397.824168] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [51397.828066] CR2: 0000000000000008 CR3: 00000001fc54d000 CR4: 00000000000007f0
> [51397.832017] Stack:
> [51397.835924] 000000000000000a ffffffff81b9d770 ffffffff81826450 ffff8801fe757a40
> [51397.840023] ffff8801fe757a40 ffff8800cf08d500 ffffffff81826450 ffffffff820f4728
> [51397.844130] ffff8801fe757978 ffffffff81828815 ffff8801fe757978 ffffffff8182aad8
> [51397.848224] Call Trace:
> [51397.852221] [<ffffffff81826450>] ? call_start+0x20/0x20
> [51397.856273] [<ffffffff81826450>] ? call_start+0x20/0x20
> [51397.860295] [<ffffffff81828815>] rpc_create_xprt+0x15/0xb0
> [51397.864324] [<ffffffff8182aad8>] ? xprt_create_transport+0x108/0x1b0
> [51397.868428] [<ffffffff81828971>] rpc_create+0xc1/0x190
> [51397.872574] [<ffffffff81111c86>] ? internal_add_timer+0x66/0x80
> [51397.876733] [<ffffffff81113a99>] ? mod_timer+0x109/0x1e0
> [51397.880877] [<ffffffff8183a19e>] rpcb_create+0x6e/0x90
> [51397.884999] [<ffffffff8183a71a>] rpcb_getport_async+0x15a/0x330
> [51397.889118] [<ffffffff8182f1da>] ? rpc_malloc+0x3a/0x70
> [51397.893240] [<ffffffff811af8d2>] ? __kmalloc+0xc2/0x170
> [51397.897354] [<ffffffff81826830>] ? call_reserveresult+0x110/0x110
> [51397.901490] [<ffffffff81826450>] ? call_start+0x20/0x20
> [51397.905606] [<ffffffff81826450>] ? call_start+0x20/0x20
> [51397.909662] [<ffffffff8182648e>] call_bind+0x3e/0x40
> [51397.913709] [<ffffffff8182fa99>] __rpc_execute+0x79/0x330
> [51397.917778] [<ffffffff818327bd>] rpc_execute+0x5d/0xa0
> [51397.921871] [<ffffffff818286cb>] rpc_run_task+0x6b/0x90
> [51397.925989] [<ffffffff8182872e>] rpc_call_sync+0x3e/0xa0
> [51397.930108] [<ffffffff8127fe29>] nsm_mon_unmon+0xb9/0xd0
> [51397.934191] [<ffffffff8110e2a0>] ? call_rcu_bh+0x20/0x20
> [51397.938235] [<ffffffff8128018c>] nsm_unmonitor+0x8c/0x140
> [51397.942309] [<ffffffff8127bc43>] nlm_destroy_host_locked+0x63/0xa0
> [51397.946442] [<ffffffff8127c03c>] nlmclnt_release_host+0x7c/0x130
> [51397.950591] [<ffffffff81279645>] nlmclnt_done+0x15/0x30
> [51397.954773] [<ffffffff81241862>] nfs_destroy_server+0x12/0x20
> [51397.958934] [<ffffffff81242372>] nfs_free_server+0x22/0xa0
> [51397.963053] [<ffffffff8124cadd>] nfs_kill_super+0x1d/0x30
> [51397.967158] [<ffffffff811c2e2c>] deactivate_locked_super+0x4c/0x70
> [51397.971286] [<ffffffff811c33f9>] deactivate_super+0x49/0x70
> [51397.975398] [<ffffffff811ddafe>] cleanup_mnt+0x3e/0x90
> [51397.979499] [<ffffffff811ddb9d>] __cleanup_mnt+0xd/0x10
> [51397.983598] [<ffffffff810e04cc>] task_work_run+0xbc/0xe0
> [51397.987697] [<ffffffff810c8f95>] do_exit+0x295/0xaf0
> [51397.991812] [<ffffffff811c2239>] ? ____fput+0x9/0x10
> [51397.995937] [<ffffffff810e04b4>] ? task_work_run+0xa4/0xe0
> [51398.000070] [<ffffffff810c986a>] do_group_exit+0x3a/0xa0
> [51398.004201] [<ffffffff810c98df>] SyS_exit_group+0xf/0x10
> [51398.008315] [<ffffffff8185e8d2>] system_call_fastpath+0x12/0x17
> [51398.012438] Code: 43 78 48 8d bb c8 00 00 00 48 89 7b 70 48 8b 30 e8 63 2d 01 00 c7 03 01 00 00 00 65 48 8b 04 25 00 aa 00 00 48 8b 80 c0 09 00 00 <4c> 8b 68 08 49 83 c5 45 4
> [51398.022378] RIP [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> [51398.026732] RSP <ffff8801fe757908>
> [51398.031025] CR2: 0000000000000008
> [51398.035326] ---[ end trace b701b037bc457620 ]---
> [51398.058223] Fixing recursive fault but reboot is needed!
>
>
> The code executed in rpc_new_client() tries to dereference the
> struct new_utsname * returned by utsname() which has already been
> released at this time.
>
> Cc: <[email protected]> # 3.18
> Signed-off-by: Bruno Prémont <[email protected]>
> ---
> I've seen this trace on 3.18.x as well as 3.19-rc.
>
>
> This patch fixes the NULL dereference but I'm not sure this is the
> right fix.
> Should init uts_ns be referred to or what is the impact of using a random
> string to satisfy rpc_clnt_set_nodename()?
>
> Thanks,
> Bruno
> ---
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 05da12a..0246e94 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -365,6 +365,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,
> const struct rpc_version *version;
> struct rpc_clnt *clnt = NULL;
> const struct rpc_timeout *timeout;
> + const struct new_utsname *uts_name;
> int err;
>
> /* sanity check the name before trying to print it */
> @@ -421,7 +422,8 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,
> atomic_set(&clnt->cl_count, 1);
>
> /* save the nodename */
> - rpc_clnt_set_nodename(clnt, utsname()->nodename);
> + uts_name = utsname();
> + rpc_clnt_set_nodename(clnt, uts_name ? uts_name->nodename : "");
>
> err = rpc_client_register(clnt, args->authflavor, args->client_name);
> if (err)

We should rather change rpcb_create() to pass the nodename from the
parent. The point is that the rpc_clnt->cl_nodename is used in various
different contexts (for instance in the AUTH_SYS credential) where it
isn't always appropriate to have it set to an empty string.

--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2015-01-30 23:49:26

by Trond Myklebust

[permalink] [raw]
Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On Sun, 2015-01-25 at 16:55 -0500, Trond Myklebust wrote:
> On Sun, Jan 25, 2015 at 4:06 PM, Bruno Prémont
> <[email protected]> wrote:
> > On a system running home-brown container (mntns, utsns, pidns, netns)
> > with NFS mount-point bind-mounted into the container I hit the following
> > trace if nfs filesystem is first umount()ed in init ns and then later
> > umounted from container when the container exists.
> >
> > [51397.767310] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > [51397.770671] IP: [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> > [51397.773967] PGD 0
> > [51397.777218] Oops: 0000 [#1] SMP
> > [51397.780490] Modules linked in:
> > [51397.783751] CPU: 0 PID: 1711 Comm: image-starter Not tainted 3.19.0-rc2-kvm+ #7
> > [51397.787123] Hardware name: Gigabyte Technology Co., Ltd. GA-A75M-UD2H/GA-A75M-UD2H, BIOS F6 09/28/2012
> > [51397.790606] task: ffff8800c9fcbac0 ti: ffff8801fe754000 task.ti: ffff8801fe754000
> > [51397.794149] RIP: 0010:[<ffffffff81828173>] [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> > [51397.797798] RSP: 0018:ffff8801fe757908 EFLAGS: 00010246
> > [51397.801444] RAX: 0000000000000000 RBX: ffff88009dafb240 RCX: 0000000000000180
> > [51397.805174] RDX: 000000000000bae0 RSI: 0000000000001770 RDI: ffff88009dafb308
> > [51397.808913] RBP: ffff8801fe757948 R08: ffff88009daf92d8 R09: ffff88009dafb458
> > [51397.812673] R10: ffff88009dafb458 R11: ffff88020ec15bc0 R12: ffff8801fe757a40
> > [51397.816456] R13: ffffffff81b9d800 R14: ffff8800c6e31030 R15: 0000000000000000
> > [51397.820270] FS: 00007f335a3a1700(0000) GS:ffff88020ec00000(0000) knlGS:00000000f7287700
> > [51397.824168] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [51397.828066] CR2: 0000000000000008 CR3: 00000001fc54d000 CR4: 00000000000007f0
> > [51397.832017] Stack:
> > [51397.835924] 000000000000000a ffffffff81b9d770 ffffffff81826450 ffff8801fe757a40
> > [51397.840023] ffff8801fe757a40 ffff8800cf08d500 ffffffff81826450 ffffffff820f4728
> > [51397.844130] ffff8801fe757978 ffffffff81828815 ffff8801fe757978 ffffffff8182aad8
> > [51397.848224] Call Trace:
> > [51397.852221] [<ffffffff81826450>] ? call_start+0x20/0x20
> > [51397.856273] [<ffffffff81826450>] ? call_start+0x20/0x20
> > [51397.860295] [<ffffffff81828815>] rpc_create_xprt+0x15/0xb0
> > [51397.864324] [<ffffffff8182aad8>] ? xprt_create_transport+0x108/0x1b0
> > [51397.868428] [<ffffffff81828971>] rpc_create+0xc1/0x190
> > [51397.872574] [<ffffffff81111c86>] ? internal_add_timer+0x66/0x80
> > [51397.876733] [<ffffffff81113a99>] ? mod_timer+0x109/0x1e0
> > [51397.880877] [<ffffffff8183a19e>] rpcb_create+0x6e/0x90
> > [51397.884999] [<ffffffff8183a71a>] rpcb_getport_async+0x15a/0x330
> > [51397.889118] [<ffffffff8182f1da>] ? rpc_malloc+0x3a/0x70
> > [51397.893240] [<ffffffff811af8d2>] ? __kmalloc+0xc2/0x170
> > [51397.897354] [<ffffffff81826830>] ? call_reserveresult+0x110/0x110
> > [51397.901490] [<ffffffff81826450>] ? call_start+0x20/0x20
> > [51397.905606] [<ffffffff81826450>] ? call_start+0x20/0x20
> > [51397.909662] [<ffffffff8182648e>] call_bind+0x3e/0x40
> > [51397.913709] [<ffffffff8182fa99>] __rpc_execute+0x79/0x330
> > [51397.917778] [<ffffffff818327bd>] rpc_execute+0x5d/0xa0
> > [51397.921871] [<ffffffff818286cb>] rpc_run_task+0x6b/0x90
> > [51397.925989] [<ffffffff8182872e>] rpc_call_sync+0x3e/0xa0
> > [51397.930108] [<ffffffff8127fe29>] nsm_mon_unmon+0xb9/0xd0
> > [51397.934191] [<ffffffff8110e2a0>] ? call_rcu_bh+0x20/0x20
> > [51397.938235] [<ffffffff8128018c>] nsm_unmonitor+0x8c/0x140
> > [51397.942309] [<ffffffff8127bc43>] nlm_destroy_host_locked+0x63/0xa0
> > [51397.946442] [<ffffffff8127c03c>] nlmclnt_release_host+0x7c/0x130
> > [51397.950591] [<ffffffff81279645>] nlmclnt_done+0x15/0x30
> > [51397.954773] [<ffffffff81241862>] nfs_destroy_server+0x12/0x20
> > [51397.958934] [<ffffffff81242372>] nfs_free_server+0x22/0xa0
> > [51397.963053] [<ffffffff8124cadd>] nfs_kill_super+0x1d/0x30
> > [51397.967158] [<ffffffff811c2e2c>] deactivate_locked_super+0x4c/0x70
> > [51397.971286] [<ffffffff811c33f9>] deactivate_super+0x49/0x70
> > [51397.975398] [<ffffffff811ddafe>] cleanup_mnt+0x3e/0x90
> > [51397.979499] [<ffffffff811ddb9d>] __cleanup_mnt+0xd/0x10
> > [51397.983598] [<ffffffff810e04cc>] task_work_run+0xbc/0xe0
> > [51397.987697] [<ffffffff810c8f95>] do_exit+0x295/0xaf0
> > [51397.991812] [<ffffffff811c2239>] ? ____fput+0x9/0x10
> > [51397.995937] [<ffffffff810e04b4>] ? task_work_run+0xa4/0xe0
> > [51398.000070] [<ffffffff810c986a>] do_group_exit+0x3a/0xa0
> > [51398.004201] [<ffffffff810c98df>] SyS_exit_group+0xf/0x10
> > [51398.008315] [<ffffffff8185e8d2>] system_call_fastpath+0x12/0x17
> > [51398.012438] Code: 43 78 48 8d bb c8 00 00 00 48 89 7b 70 48 8b 30 e8 63 2d 01 00 c7 03 01 00 00 00 65 48 8b 04 25 00 aa 00 00 48 8b 80 c0 09 00 00 <4c> 8b 68 08 49 83 c5 45 4
> > [51398.022378] RIP [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> > [51398.026732] RSP <ffff8801fe757908>
> > [51398.031025] CR2: 0000000000000008
> > [51398.035326] ---[ end trace b701b037bc457620 ]---
> > [51398.058223] Fixing recursive fault but reboot is needed!
>
> We should rather change rpcb_create() to pass the nodename from the
> parent. The point is that the rpc_clnt->cl_nodename is used in various
> different contexts (for instance in the AUTH_SYS credential) where it
> isn't always appropriate to have it set to an empty string.

I was rather hoping that Bruno would fix up his patch and resend, but
since other reports of the same bug are now surfacing... Please could
you all check if something like the following patch fixes it.

Thanks
Trond

8<---------------------------------------------------------------------
>From 87557df0ca2241da0edac558286650fdb081152c Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Fri, 30 Jan 2015 18:12:28 -0500
Subject: [PATCH] SUNRPC: NULL utsname dereference on NFS umount during
namespace cleanup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fix an Oopsable condition when nsm_mon_unmon is called as part of the
namespace cleanup, which now apparently happens after the utsname
has been freed.

Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Bruno Prémont <[email protected]>
Cc: [email protected] # 3.18
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/lockd/mon.c | 13 +++++++++----
include/linux/sunrpc/clnt.h | 3 ++-
net/sunrpc/clnt.c | 12 +++++++-----
net/sunrpc/rpcb_clnt.c | 8 ++++++--
4 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
index 1cc6ec51e6b1..47a32b6d9b90 100644
--- a/fs/lockd/mon.c
+++ b/fs/lockd/mon.c
@@ -65,7 +65,7 @@ static inline struct sockaddr *nsm_addr(const struct nsm_handle *nsm)
return (struct sockaddr *)&nsm->sm_addr;
}

-static struct rpc_clnt *nsm_create(struct net *net)
+static struct rpc_clnt *nsm_create(struct net *net, const char *nodename)
{
struct sockaddr_in sin = {
.sin_family = AF_INET,
@@ -77,6 +77,7 @@ static struct rpc_clnt *nsm_create(struct net *net)
.address = (struct sockaddr *)&sin,
.addrsize = sizeof(sin),
.servername = "rpc.statd",
+ .nodename = nodename,
.program = &nsm_program,
.version = NSM_VERSION,
.authflavor = RPC_AUTH_NULL,
@@ -102,7 +103,7 @@ out:
return clnt;
}

-static struct rpc_clnt *nsm_client_get(struct net *net)
+static struct rpc_clnt *nsm_client_get(struct net *net, const char *nodename)
{
struct rpc_clnt *clnt, *new;
struct lockd_net *ln = net_generic(net, lockd_net_id);
@@ -111,7 +112,7 @@ static struct rpc_clnt *nsm_client_get(struct net *net)
if (clnt != NULL)
goto out;

- clnt = new = nsm_create(net);
+ clnt = new = nsm_create(net, nodename);
if (IS_ERR(clnt))
goto out;

@@ -190,19 +191,23 @@ int nsm_monitor(const struct nlm_host *host)
struct nsm_res res;
int status;
struct rpc_clnt *clnt;
+ const char *nodename = NULL;

dprintk("lockd: nsm_monitor(%s)\n", nsm->sm_name);

if (nsm->sm_monitored)
return 0;

+ if (host->h_rpcclnt)
+ nodename = host->h_rpcclnt->cl_nodename;
+
/*
* Choose whether to record the caller_name or IP address of
* this peer in the local rpc.statd's database.
*/
nsm->sm_mon_name = nsm_use_hostnames ? nsm->sm_name : nsm->sm_addrbuf;

- clnt = nsm_client_get(host->net);
+ clnt = nsm_client_get(host->net, nodename);
if (IS_ERR(clnt)) {
status = PTR_ERR(clnt);
dprintk("lockd: failed to create NSM upcall transport, "
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index d86acc63b25f..598ba80ec30c 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -57,7 +57,7 @@ struct rpc_clnt {
const struct rpc_timeout *cl_timeout; /* Timeout strategy */

int cl_nodelen; /* nodename length */
- char cl_nodename[UNX_MAXNODENAME];
+ char cl_nodename[UNX_MAXNODENAME+1];
struct rpc_pipe_dir_head cl_pipedir_objects;
struct rpc_clnt * cl_parent; /* Points to parent of clones */
struct rpc_rtt cl_rtt_default;
@@ -112,6 +112,7 @@ struct rpc_create_args {
struct sockaddr *saddress;
const struct rpc_timeout *timeout;
const char *servername;
+ const char *nodename;
const struct rpc_program *program;
u32 prognumber; /* overrides program->number */
u32 version;
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 05da12a33945..3f5d4d48f0cb 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -286,10 +286,8 @@ static struct rpc_xprt *rpc_clnt_set_transport(struct rpc_clnt *clnt,

static void rpc_clnt_set_nodename(struct rpc_clnt *clnt, const char *nodename)
{
- clnt->cl_nodelen = strlen(nodename);
- if (clnt->cl_nodelen > UNX_MAXNODENAME)
- clnt->cl_nodelen = UNX_MAXNODENAME;
- memcpy(clnt->cl_nodename, nodename, clnt->cl_nodelen);
+ clnt->cl_nodelen = strlcpy(clnt->cl_nodename,
+ nodename, sizeof(clnt->cl_nodename));
}

static int rpc_client_register(struct rpc_clnt *clnt,
@@ -365,6 +363,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,
const struct rpc_version *version;
struct rpc_clnt *clnt = NULL;
const struct rpc_timeout *timeout;
+ const char *nodename = args->nodename;
int err;

/* sanity check the name before trying to print it */
@@ -420,8 +419,10 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,

atomic_set(&clnt->cl_count, 1);

+ if (nodename == NULL)
+ nodename = utsname()->nodename;
/* save the nodename */
- rpc_clnt_set_nodename(clnt, utsname()->nodename);
+ rpc_clnt_set_nodename(clnt, nodename);

err = rpc_client_register(clnt, args->authflavor, args->client_name);
if (err)
@@ -576,6 +577,7 @@ static struct rpc_clnt *__rpc_clone_client(struct rpc_create_args *args,
if (xprt == NULL)
goto out_err;
args->servername = xprt->servername;
+ args->nodename = clnt->cl_nodename;

new = rpc_new_client(args, xprt, clnt);
if (IS_ERR(new)) {
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 05202012bcfc..cf5770d8f49a 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -355,7 +355,8 @@ out:
return result;
}

-static struct rpc_clnt *rpcb_create(struct net *net, const char *hostname,
+static struct rpc_clnt *rpcb_create(struct net *net, const char *nodename,
+ const char *hostname,
struct sockaddr *srvaddr, size_t salen,
int proto, u32 version)
{
@@ -365,6 +366,7 @@ static struct rpc_clnt *rpcb_create(struct net *net, const char *hostname,
.address = srvaddr,
.addrsize = salen,
.servername = hostname,
+ .nodename = nodename,
.program = &rpcb_program,
.version = version,
.authflavor = RPC_AUTH_UNIX,
@@ -740,7 +742,9 @@ void rpcb_getport_async(struct rpc_task *task)
dprintk("RPC: %5u %s: trying rpcbind version %u\n",
task->tk_pid, __func__, bind_version);

- rpcb_clnt = rpcb_create(xprt->xprt_net, xprt->servername, sap, salen,
+ rpcb_clnt = rpcb_create(xprt->xprt_net,
+ clnt->cl_nodename,
+ xprt->servername, sap, salen,
xprt->prot, bind_version);
if (IS_ERR(rpcb_clnt)) {
status = PTR_ERR(rpcb_clnt);
--
2.1.0




2015-01-31 00:44:29

by Nix

[permalink] [raw]
Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On 30 Jan 2015, Trond Myklebust uttered the following:

> On Sun, 2015-01-25 at 16:55 -0500, Trond Myklebust wrote:
>> On Sun, Jan 25, 2015 at 4:06 PM, Bruno Prémont
>> <[email protected]> wrote:
>> > On a system running home-brown container (mntns, utsns, pidns, netns)
>> > with NFS mount-point bind-mounted into the container I hit the following
>> > trace if nfs filesystem is first umount()ed in init ns and then later
>> > umounted from container when the container exists.

I'm not using containers, but I *am* extensively bind-mounting
NFS filesystems, which probably has the same net effect.

> I was rather hoping that Bruno would fix up his patch and resend, but
> since other reports of the same bug are now surfacing... Please could
> you all check if something like the following patch fixes it.

I have to wait for another of those xprt != 0 warnings, I think. I've
had a couple of clean reboots, but I had the occasional clean reboot
even before this patch.

I'll let it run overnight and give it a reboot in the morning.

--
NULL && (void)

2015-01-31 09:03:10

by Bruno Prémont

[permalink] [raw]
Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On Fri, 30 Jan 2015 18:49:21 Trond Myklebust <[email protected]> wrote:
> On Sun, 2015-01-25 at 16:55 -0500, Trond Myklebust wrote:
> > On Sun, Jan 25, 2015 at 4:06 PM, Bruno Prémont
> > <[email protected]> wrote:
> > > On a system running home-brown container (mntns, utsns, pidns,
> > > netns) with NFS mount-point bind-mounted into the container I hit
> > > the following trace if nfs filesystem is first umount()ed in init
> > > ns and then later umounted from container when the container
> > > exists.
> > >
> >
> > We should rather change rpcb_create() to pass the nodename from the
> > parent. The point is that the rpc_clnt->cl_nodename is used in
> > various different contexts (for instance in the AUTH_SYS
> > credential) where it isn't always appropriate to have it set to an
> > empty string.
>
> I was rather hoping that Bruno would fix up his patch and resend, but
> since other reports of the same bug are now surfacing... Please could
> you all check if something like the following patch fixes it.

With FOSDEM this weekend I've had no time to look into it.

Will test when home on Wednesday.

Thanks,
Bruno


> Thanks
> Trond
>
> 8<---------------------------------------------------------------------
> From 87557df0ca2241da0edac558286650fdb081152c Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Fri, 30 Jan 2015 18:12:28 -0500
> Subject: [PATCH] SUNRPC: NULL utsname dereference on NFS umount during
> namespace cleanup
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Fix an Oopsable condition when nsm_mon_unmon is called as part of the
> namespace cleanup, which now apparently happens after the utsname
> has been freed.
>
> Link: http://lkml.kernel.org/r/[email protected]
> Reported-by: Bruno Prémont <[email protected]>
> Cc: [email protected] # 3.18
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/lockd/mon.c | 13 +++++++++----
> include/linux/sunrpc/clnt.h | 3 ++-
> net/sunrpc/clnt.c | 12 +++++++-----
> net/sunrpc/rpcb_clnt.c | 8 ++++++--
> 4 files changed, 24 insertions(+), 12 deletions(-)
>
> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
> index 1cc6ec51e6b1..47a32b6d9b90 100644
> --- a/fs/lockd/mon.c
> +++ b/fs/lockd/mon.c
> @@ -65,7 +65,7 @@ static inline struct sockaddr *nsm_addr(const
> struct nsm_handle *nsm) return (struct sockaddr *)&nsm->sm_addr;
> }
>
> -static struct rpc_clnt *nsm_create(struct net *net)
> +static struct rpc_clnt *nsm_create(struct net *net, const char
> *nodename) {
> struct sockaddr_in sin = {
> .sin_family = AF_INET,
> @@ -77,6 +77,7 @@ static struct rpc_clnt *nsm_create(struct net *net)
> .address = (struct sockaddr *)&sin,
> .addrsize = sizeof(sin),
> .servername = "rpc.statd",
> + .nodename = nodename,
> .program = &nsm_program,
> .version = NSM_VERSION,
> .authflavor = RPC_AUTH_NULL,
> @@ -102,7 +103,7 @@ out:
> return clnt;
> }
>
> -static struct rpc_clnt *nsm_client_get(struct net *net)
> +static struct rpc_clnt *nsm_client_get(struct net *net, const char
> *nodename) {
> struct rpc_clnt *clnt, *new;
> struct lockd_net *ln = net_generic(net, lockd_net_id);
> @@ -111,7 +112,7 @@ static struct rpc_clnt *nsm_client_get(struct net
> *net) if (clnt != NULL)
> goto out;
>
> - clnt = new = nsm_create(net);
> + clnt = new = nsm_create(net, nodename);
> if (IS_ERR(clnt))
> goto out;
>
> @@ -190,19 +191,23 @@ int nsm_monitor(const struct nlm_host *host)
> struct nsm_res res;
> int status;
> struct rpc_clnt *clnt;
> + const char *nodename = NULL;
>
> dprintk("lockd: nsm_monitor(%s)\n", nsm->sm_name);
>
> if (nsm->sm_monitored)
> return 0;
>
> + if (host->h_rpcclnt)
> + nodename = host->h_rpcclnt->cl_nodename;
> +
> /*
> * Choose whether to record the caller_name or IP address of
> * this peer in the local rpc.statd's database.
> */
> nsm->sm_mon_name = nsm_use_hostnames ? nsm->sm_name :
> nsm->sm_addrbuf;
> - clnt = nsm_client_get(host->net);
> + clnt = nsm_client_get(host->net, nodename);
> if (IS_ERR(clnt)) {
> status = PTR_ERR(clnt);
> dprintk("lockd: failed to create NSM upcall
> transport, " diff --git a/include/linux/sunrpc/clnt.h
> b/include/linux/sunrpc/clnt.h index d86acc63b25f..598ba80ec30c 100644
> --- a/include/linux/sunrpc/clnt.h
> +++ b/include/linux/sunrpc/clnt.h
> @@ -57,7 +57,7 @@ struct rpc_clnt {
> const struct rpc_timeout *cl_timeout; /* Timeout
> strategy */
> int cl_nodelen; /* nodename
> length */
> - char cl_nodename[UNX_MAXNODENAME];
> + char cl_nodename[UNX_MAXNODENAME+1];
> struct rpc_pipe_dir_head cl_pipedir_objects;
> struct rpc_clnt * cl_parent; /* Points to
> parent of clones */ struct rpc_rtt cl_rtt_default;
> @@ -112,6 +112,7 @@ struct rpc_create_args {
> struct sockaddr *saddress;
> const struct rpc_timeout *timeout;
> const char *servername;
> + const char *nodename;
> const struct rpc_program *program;
> u32 prognumber; /* overrides
> program->number */ u32 version;
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 05da12a33945..3f5d4d48f0cb 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -286,10 +286,8 @@ static struct rpc_xprt
> *rpc_clnt_set_transport(struct rpc_clnt *clnt,
> static void rpc_clnt_set_nodename(struct rpc_clnt *clnt, const char
> *nodename) {
> - clnt->cl_nodelen = strlen(nodename);
> - if (clnt->cl_nodelen > UNX_MAXNODENAME)
> - clnt->cl_nodelen = UNX_MAXNODENAME;
> - memcpy(clnt->cl_nodename, nodename, clnt->cl_nodelen);
> + clnt->cl_nodelen = strlcpy(clnt->cl_nodename,
> + nodename, sizeof(clnt->cl_nodename));
> }
>
> static int rpc_client_register(struct rpc_clnt *clnt,
> @@ -365,6 +363,7 @@ static struct rpc_clnt * rpc_new_client(const
> struct rpc_create_args *args, const struct rpc_version *version;
> struct rpc_clnt *clnt = NULL;
> const struct rpc_timeout *timeout;
> + const char *nodename = args->nodename;
> int err;
>
> /* sanity check the name before trying to print it */
> @@ -420,8 +419,10 @@ static struct rpc_clnt * rpc_new_client(const
> struct rpc_create_args *args,
> atomic_set(&clnt->cl_count, 1);
>
> + if (nodename == NULL)
> + nodename = utsname()->nodename;
> /* save the nodename */
> - rpc_clnt_set_nodename(clnt, utsname()->nodename);
> + rpc_clnt_set_nodename(clnt, nodename);
>
> err = rpc_client_register(clnt, args->authflavor,
> args->client_name); if (err)
> @@ -576,6 +577,7 @@ static struct rpc_clnt *__rpc_clone_client(struct
> rpc_create_args *args, if (xprt == NULL)
> goto out_err;
> args->servername = xprt->servername;
> + args->nodename = clnt->cl_nodename;
>
> new = rpc_new_client(args, xprt, clnt);
> if (IS_ERR(new)) {
> diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
> index 05202012bcfc..cf5770d8f49a 100644
> --- a/net/sunrpc/rpcb_clnt.c
> +++ b/net/sunrpc/rpcb_clnt.c
> @@ -355,7 +355,8 @@ out:
> return result;
> }
>
> -static struct rpc_clnt *rpcb_create(struct net *net, const char
> *hostname, +static struct rpc_clnt *rpcb_create(struct net *net,
> const char *nodename,
> + const char *hostname,
> struct sockaddr *srvaddr, size_t
> salen, int proto, u32 version)
> {
> @@ -365,6 +366,7 @@ static struct rpc_clnt *rpcb_create(struct net
> *net, const char *hostname, .address = srvaddr,
> .addrsize = salen,
> .servername = hostname,
> + .nodename = nodename,
> .program = &rpcb_program,
> .version = version,
> .authflavor = RPC_AUTH_UNIX,
> @@ -740,7 +742,9 @@ void rpcb_getport_async(struct rpc_task *task)
> dprintk("RPC: %5u %s: trying rpcbind version %u\n",
> task->tk_pid, __func__, bind_version);
>
> - rpcb_clnt = rpcb_create(xprt->xprt_net, xprt->servername,
> sap, salen,
> + rpcb_clnt = rpcb_create(xprt->xprt_net,
> + clnt->cl_nodename,
> + xprt->servername, sap, salen,
> xprt->prot, bind_version);
> if (IS_ERR(rpcb_clnt)) {
> status = PTR_ERR(rpcb_clnt);


2015-02-02 17:41:53

by Nix

[permalink] [raw]
Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On 31 Jan 2015, [email protected] told this:
> I'll let it run overnight and give it a reboot in the morning.

Alas, my latest reboot hit:

[ 215.245158] BUG: unable to handle kernel NULL pointer dereference at 00000004
[ 215.251602] IP: [<c034fb8c>] rpc_new_client+0x13b/0x1f2
[ 215.251602] *pde = 00000000
[ 215.251602] Oops: 0000 [#1]
[ 215.251602] CPU: 0 PID: 398 Comm: bash Not tainted 3.18.5+ #1
[ 215.251602] task: de1fcfc0 ti: de1fa000 task.ti: de1fa000
[ 215.251602] EIP: 0060:[<c034fb8c>] EFLAGS: 00010246 CPU: 0
[ 215.251602] EIP is at rpc_new_client+0x13b/0x1f2
[ 215.251602] EAX: 00000000 EBX: df70f000 ECX: 0000bae0 EDX: 00000005
[ 215.251602] ESI: 00000000 EDI: df6cc000 EBP: 00000007 ESP: de1fbcac
[ 215.251602] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 215.251602] CR0: 8005003b CR2: 00000004 CR3: 1f645000 CR4: 00000090
[ 215.251602] Stack:
[ 215.251602] 000000d0 df6cc000 de1fbd4c 00000000 de1fbd4c de1fbd4c de1fbd10 de1fbf8c
[ 215.251602] c0350196 de1fbd4c de2056d0 de1fbd10 c0350300 c0262e32 c0263841 df528000
[ 215.251602] a2e0b542 00000006 c044b178 00000000 de1fbddc 00000010 de2056d0 00000000
[ 215.251602] Call Trace:
[ 215.251602] [<c0350196>] ? rpc_create_xprt+0xc/0x74
[ 215.251602] [<c0350300>] ? rpc_create+0x102/0x10f
[ 215.251602] [<c0262e32>] ? ata_sff_check_status+0x8/0x9
[ 215.251602] [<c0263841>] ? ata_dev_select.constprop.20+0x83/0x95
[ 215.251602] [<c019ec6a>] ? __block_commit_write.isra.25+0x56/0x7f
[ 215.251602] [<c0359f04>] ? rpcb_create+0x6e/0x7c
[ 215.251602] [<c035a677>] ? rpcb_getport_async+0x124/0x25a
[ 215.251602] [<c0133bdf>] ? update_curr+0x81/0xb3
[ 215.251602] [<c0133e97>] ? check_preempt_wakeup+0xf0/0x134
[ 215.251602] [<c013110f>] ? check_preempt_curr+0x21/0x59
[ 215.251602] [<c0355294>] ? rpcauth_lookupcred+0x3f/0x47
[ 215.251602] [<c017ebb6>] ? __kmalloc+0xa3/0xc4
[ 215.251602] [<c0354c09>] ? rpc_malloc+0x39/0x48
[ 215.251602] [<c034ef05>] ? call_bind+0x2d/0x2e
[ 215.251602] [<c0354a71>] ? __rpc_execute+0x5c/0x187
[ 215.251602] [<c03500be>] ? rpc_run_task+0x55/0x5a
[ 215.251602] [<c035012c>] ? rpc_call_sync+0x69/0x81
[ 215.251602] [<c01e219e>] ? nsm_mon_unmon+0x8c/0xa0
[ 215.251602] [<c01e23c5>] ? nsm_unmonitor+0x5f/0xd3
[ 215.251602] [<c016b3cc>] ? bdi_unregister+0xf2/0x100
[ 215.251602] [<c01df4cc>] ? nlm_destroy_host_locked+0x4f/0x7c
[ 215.251602] [<c01df703>] ? nlmclnt_release_host+0xd8/0xe5
[ 215.251602] [<c01dddd5>] ? nlmclnt_done+0xc/0x14
[ 215.251602] [<c01ce621>] ? nfs_free_server+0x16/0x72
[ 215.251602] [<c01837a4>] ? deactivate_locked_super+0x26/0x37
[ 215.251602] [<c019493e>] ? cleanup_mnt+0x40/0x59
[ 215.251602] [<c012e185>] ? task_work_run+0x4f/0x5f
[ 215.251602] [<c0121373>] ? do_exit+0x264/0x670
[ 215.251602] [<c0127a2b>] ? __set_current_blocked+0xd/0xf
[ 215.251602] [<c0127b26>] ? sigprocmask+0x77/0x87
[ 215.251602] [<c012e097>] ? __task_pid_nr_ns+0x3a/0x41
[ 215.251602] [<c012e019>] ? find_vpid+0xd/0x17
[ 215.251602] [<c01217cc>] ? do_group_exit+0x2b/0x5d
[ 215.251602] [<c012180d>] ? SyS_exit_group+0xf/0xf
[ 215.251602] [<c03679b6>] ? syscall_call+0x7/0x7
[ 215.251602] Code: 89 43 40 8b 44 24 04 89 43 18 8d 43 78 8b 53 40 89 43 3c 8b 12 e8 32 ac 00 00 c7 03 01 00 00 00 a1 dc f0 42 c0 8b 80 00 03 00 00 <8b> 70 04 83 c6 45 89 f0 e8 ab d1 eb ff 83 f8 20 7f 05 89 43 44
[ 215.251602] EIP: [<c034fb8c>] rpc_new_client+0x13b/0x1f2 SS:ESP 0068:de1fbcac
[ 215.251602] CR2: 0000000000000004
[ 215.251602] ---[ end trace 4b9e971f6b3f2dc8 ]---
[ 215.251602] Kernel panic - not syncing: Fatal exception
[ 215.251602] Kernel Offset: 0x0 from 0xc0100000 (relocation range: 0xc0000000-0xe07fffff)
[ 215.251602] Rebooting in 5 seconds..

so clearly the bug is still there; also the connection I thought might
exist with the "xprt_adjust_timeout: rq_timeout = 0!" was illusory: that
message hadn't recurred since before the last reboot but one, but the
crash happened anyway.

--
NULL && (void)

2015-02-02 22:54:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On Mon, Feb 2, 2015 at 12:41 PM, Nix <[email protected]> wrote:
> On 31 Jan 2015, [email protected] told this:
>> I'll let it run overnight and give it a reboot in the morning.
>
> Alas, my latest reboot hit:
>
> [ 215.245158] BUG: unable to handle kernel NULL pointer dereference at 00000004
> [ 215.251602] IP: [<c034fb8c>] rpc_new_client+0x13b/0x1f2
> [ 215.251602] *pde = 00000000
> [ 215.251602] Oops: 0000 [#1]
> [ 215.251602] CPU: 0 PID: 398 Comm: bash Not tainted 3.18.5+ #1
> [ 215.251602] task: de1fcfc0 ti: de1fa000 task.ti: de1fa000
> [ 215.251602] EIP: 0060:[<c034fb8c>] EFLAGS: 00010246 CPU: 0
> [ 215.251602] EIP is at rpc_new_client+0x13b/0x1f2
> [ 215.251602] EAX: 00000000 EBX: df70f000 ECX: 0000bae0 EDX: 00000005
> [ 215.251602] ESI: 00000000 EDI: df6cc000 EBP: 00000007 ESP: de1fbcac
> [ 215.251602] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> [ 215.251602] CR0: 8005003b CR2: 00000004 CR3: 1f645000 CR4: 00000090
> [ 215.251602] Stack:
> [ 215.251602] 000000d0 df6cc000 de1fbd4c 00000000 de1fbd4c de1fbd4c de1fbd10 de1fbf8c
> [ 215.251602] c0350196 de1fbd4c de2056d0 de1fbd10 c0350300 c0262e32 c0263841 df528000
> [ 215.251602] a2e0b542 00000006 c044b178 00000000 de1fbddc 00000010 de2056d0 00000000
> [ 215.251602] Call Trace:
> [ 215.251602] [<c0350196>] ? rpc_create_xprt+0xc/0x74
> [ 215.251602] [<c0350300>] ? rpc_create+0x102/0x10f
> [ 215.251602] [<c0262e32>] ? ata_sff_check_status+0x8/0x9
> [ 215.251602] [<c0263841>] ? ata_dev_select.constprop.20+0x83/0x95
> [ 215.251602] [<c019ec6a>] ? __block_commit_write.isra.25+0x56/0x7f
> [ 215.251602] [<c0359f04>] ? rpcb_create+0x6e/0x7c
> [ 215.251602] [<c035a677>] ? rpcb_getport_async+0x124/0x25a
> [ 215.251602] [<c0133bdf>] ? update_curr+0x81/0xb3
> [ 215.251602] [<c0133e97>] ? check_preempt_wakeup+0xf0/0x134
> [ 215.251602] [<c013110f>] ? check_preempt_curr+0x21/0x59
> [ 215.251602] [<c0355294>] ? rpcauth_lookupcred+0x3f/0x47
> [ 215.251602] [<c017ebb6>] ? __kmalloc+0xa3/0xc4
> [ 215.251602] [<c0354c09>] ? rpc_malloc+0x39/0x48
> [ 215.251602] [<c034ef05>] ? call_bind+0x2d/0x2e
> [ 215.251602] [<c0354a71>] ? __rpc_execute+0x5c/0x187
> [ 215.251602] [<c03500be>] ? rpc_run_task+0x55/0x5a
> [ 215.251602] [<c035012c>] ? rpc_call_sync+0x69/0x81
> [ 215.251602] [<c01e219e>] ? nsm_mon_unmon+0x8c/0xa0
> [ 215.251602] [<c01e23c5>] ? nsm_unmonitor+0x5f/0xd3
> [ 215.251602] [<c016b3cc>] ? bdi_unregister+0xf2/0x100
> [ 215.251602] [<c01df4cc>] ? nlm_destroy_host_locked+0x4f/0x7c
> [ 215.251602] [<c01df703>] ? nlmclnt_release_host+0xd8/0xe5
> [ 215.251602] [<c01dddd5>] ? nlmclnt_done+0xc/0x14
> [ 215.251602] [<c01ce621>] ? nfs_free_server+0x16/0x72
> [ 215.251602] [<c01837a4>] ? deactivate_locked_super+0x26/0x37
> [ 215.251602] [<c019493e>] ? cleanup_mnt+0x40/0x59
> [ 215.251602] [<c012e185>] ? task_work_run+0x4f/0x5f
> [ 215.251602] [<c0121373>] ? do_exit+0x264/0x670
> [ 215.251602] [<c0127a2b>] ? __set_current_blocked+0xd/0xf
> [ 215.251602] [<c0127b26>] ? sigprocmask+0x77/0x87
> [ 215.251602] [<c012e097>] ? __task_pid_nr_ns+0x3a/0x41
> [ 215.251602] [<c012e019>] ? find_vpid+0xd/0x17
> [ 215.251602] [<c01217cc>] ? do_group_exit+0x2b/0x5d
> [ 215.251602] [<c012180d>] ? SyS_exit_group+0xf/0xf
> [ 215.251602] [<c03679b6>] ? syscall_call+0x7/0x7
> [ 215.251602] Code: 89 43 40 8b 44 24 04 89 43 18 8d 43 78 8b 53 40 89 43 3c 8b 12 e8 32 ac 00 00 c7 03 01 00 00 00 a1 dc f0 42 c0 8b 80 00 03 00 00 <8b> 70 04 83 c6 45 89 f0 e8 ab d1 eb ff 83 f8 20 7f 05 89 43 44
> [ 215.251602] EIP: [<c034fb8c>] rpc_new_client+0x13b/0x1f2 SS:ESP 0068:de1fbcac
> [ 215.251602] CR2: 0000000000000004
> [ 215.251602] ---[ end trace 4b9e971f6b3f2dc8 ]---
> [ 215.251602] Kernel panic - not syncing: Fatal exception
> [ 215.251602] Kernel Offset: 0x0 from 0xc0100000 (relocation range: 0xc0000000-0xe07fffff)
> [ 215.251602] Rebooting in 5 seconds..
>
> so clearly the bug is still there; also the connection I thought might
> exist with the "xprt_adjust_timeout: rq_timeout = 0!" was illusory: that
> message hadn't recurred since before the last reboot but one, but the
> crash happened anyway.
>

Hmm... I'm at a loss to see how rpcb_create can ever call
rpc_new_client() with a null value for the nodename with that patch
applied. Are you 100% sure that the above Oops came from a patched
kernel? That IP address of "rpc_new_client+0x13b/0x1f2" looks
identical to the one in your original posting.

--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2015-02-03 00:20:55

by Nix

[permalink] [raw]
Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On 2 Feb 2015, Trond Myklebust verbalised:
> Hmm... I'm at a loss to see how rpcb_create can ever call
> rpc_new_client() with a null value for the nodename with that patch
> applied. Are you 100% sure that the above Oops came from a patched
> kernel? That IP address of "rpc_new_client+0x13b/0x1f2" looks
> identical to the one in your original posting.

I've been swapping kernels a lot of late due to bisection -- it is
perfectly possible that I somehow ended up running an unpatched one :/

I'll do a build from scratch with the patch and reboot into it. My
apologies if this was a false positive (which it looks quite like it
might have been: your evidence is pretty persuasive!)

--
NULL && (void)

2015-02-04 17:08:39

by Bruno Prémont

[permalink] [raw]
Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On Fri, 30 January 2015 Trond Myklebust wrote:
> On Sun, 2015-01-25 at 16:55 -0500, Trond Myklebust wrote:
> > On Sun, Jan 25, 2015 at 4:06 PM, Bruno Prémont wrote:
> > > On a system running home-brown container (mntns, utsns, pidns, netns)
> > > with NFS mount-point bind-mounted into the container I hit the following
> > > trace if nfs filesystem is first umount()ed in init ns and then later
> > > umounted from container when the container exists.
> > >
> > > [51397.767310] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > > [51397.770671] IP: [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> > > [51397.773967] PGD 0
> > > [51397.777218] Oops: 0000 [#1] SMP
> > > [51397.780490] Modules linked in:
> > > [51397.783751] CPU: 0 PID: 1711 Comm: image-starter Not tainted 3.19.0-rc2-kvm+ #7
> > > [51397.787123] Hardware name: Gigabyte Technology Co., Ltd. GA-A75M-UD2H/GA-A75M-UD2H, BIOS F6 09/28/2012
> > > [51397.790606] task: ffff8800c9fcbac0 ti: ffff8801fe754000 task.ti: ffff8801fe754000
> > > [51397.794149] RIP: 0010:[<ffffffff81828173>] [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> > > [51397.797798] RSP: 0018:ffff8801fe757908 EFLAGS: 00010246
> > > [51397.801444] RAX: 0000000000000000 RBX: ffff88009dafb240 RCX: 0000000000000180
> > > [51397.805174] RDX: 000000000000bae0 RSI: 0000000000001770 RDI: ffff88009dafb308
> > > [51397.808913] RBP: ffff8801fe757948 R08: ffff88009daf92d8 R09: ffff88009dafb458
> > > [51397.812673] R10: ffff88009dafb458 R11: ffff88020ec15bc0 R12: ffff8801fe757a40
> > > [51397.816456] R13: ffffffff81b9d800 R14: ffff8800c6e31030 R15: 0000000000000000
> > > [51397.820270] FS: 00007f335a3a1700(0000) GS:ffff88020ec00000(0000) knlGS:00000000f7287700
> > > [51397.824168] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > [51397.828066] CR2: 0000000000000008 CR3: 00000001fc54d000 CR4: 00000000000007f0
> > > [51397.832017] Stack:
> > > [51397.835924] 000000000000000a ffffffff81b9d770 ffffffff81826450 ffff8801fe757a40
> > > [51397.840023] ffff8801fe757a40 ffff8800cf08d500 ffffffff81826450 ffffffff820f4728
> > > [51397.844130] ffff8801fe757978 ffffffff81828815 ffff8801fe757978 ffffffff8182aad8
> > > [51397.848224] Call Trace:
> > > [51397.852221] [<ffffffff81826450>] ? call_start+0x20/0x20
> > > [51397.856273] [<ffffffff81826450>] ? call_start+0x20/0x20
> > > [51397.860295] [<ffffffff81828815>] rpc_create_xprt+0x15/0xb0
> > > [51397.864324] [<ffffffff8182aad8>] ? xprt_create_transport+0x108/0x1b0
> > > [51397.868428] [<ffffffff81828971>] rpc_create+0xc1/0x190
> > > [51397.872574] [<ffffffff81111c86>] ? internal_add_timer+0x66/0x80
> > > [51397.876733] [<ffffffff81113a99>] ? mod_timer+0x109/0x1e0
> > > [51397.880877] [<ffffffff8183a19e>] rpcb_create+0x6e/0x90
> > > [51397.884999] [<ffffffff8183a71a>] rpcb_getport_async+0x15a/0x330
> > > [51397.889118] [<ffffffff8182f1da>] ? rpc_malloc+0x3a/0x70
> > > [51397.893240] [<ffffffff811af8d2>] ? __kmalloc+0xc2/0x170
> > > [51397.897354] [<ffffffff81826830>] ? call_reserveresult+0x110/0x110
> > > [51397.901490] [<ffffffff81826450>] ? call_start+0x20/0x20
> > > [51397.905606] [<ffffffff81826450>] ? call_start+0x20/0x20
> > > [51397.909662] [<ffffffff8182648e>] call_bind+0x3e/0x40
> > > [51397.913709] [<ffffffff8182fa99>] __rpc_execute+0x79/0x330
> > > [51397.917778] [<ffffffff818327bd>] rpc_execute+0x5d/0xa0
> > > [51397.921871] [<ffffffff818286cb>] rpc_run_task+0x6b/0x90
> > > [51397.925989] [<ffffffff8182872e>] rpc_call_sync+0x3e/0xa0
> > > [51397.930108] [<ffffffff8127fe29>] nsm_mon_unmon+0xb9/0xd0
> > > [51397.934191] [<ffffffff8110e2a0>] ? call_rcu_bh+0x20/0x20
> > > [51397.938235] [<ffffffff8128018c>] nsm_unmonitor+0x8c/0x140
> > > [51397.942309] [<ffffffff8127bc43>] nlm_destroy_host_locked+0x63/0xa0
> > > [51397.946442] [<ffffffff8127c03c>] nlmclnt_release_host+0x7c/0x130
> > > [51397.950591] [<ffffffff81279645>] nlmclnt_done+0x15/0x30
> > > [51397.954773] [<ffffffff81241862>] nfs_destroy_server+0x12/0x20
> > > [51397.958934] [<ffffffff81242372>] nfs_free_server+0x22/0xa0
> > > [51397.963053] [<ffffffff8124cadd>] nfs_kill_super+0x1d/0x30
> > > [51397.967158] [<ffffffff811c2e2c>] deactivate_locked_super+0x4c/0x70
> > > [51397.971286] [<ffffffff811c33f9>] deactivate_super+0x49/0x70
> > > [51397.975398] [<ffffffff811ddafe>] cleanup_mnt+0x3e/0x90
> > > [51397.979499] [<ffffffff811ddb9d>] __cleanup_mnt+0xd/0x10
> > > [51397.983598] [<ffffffff810e04cc>] task_work_run+0xbc/0xe0
> > > [51397.987697] [<ffffffff810c8f95>] do_exit+0x295/0xaf0
> > > [51397.991812] [<ffffffff811c2239>] ? ____fput+0x9/0x10
> > > [51397.995937] [<ffffffff810e04b4>] ? task_work_run+0xa4/0xe0
> > > [51398.000070] [<ffffffff810c986a>] do_group_exit+0x3a/0xa0
> > > [51398.004201] [<ffffffff810c98df>] SyS_exit_group+0xf/0x10
> > > [51398.008315] [<ffffffff8185e8d2>] system_call_fastpath+0x12/0x17
> > > [51398.012438] Code: 43 78 48 8d bb c8 00 00 00 48 89 7b 70 48 8b 30 e8 63 2d 01 00 c7 03 01 00 00 00 65 48 8b 04 25 00 aa 00 00 48 8b 80 c0 09 00 00 <4c> 8b 68 08 49 83 c5 45 4
> > > [51398.022378] RIP [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> > > [51398.026732] RSP <ffff8801fe757908>
> > > [51398.031025] CR2: 0000000000000008
> > > [51398.035326] ---[ end trace b701b037bc457620 ]---
> > > [51398.058223] Fixing recursive fault but reboot is needed!
> >
> > We should rather change rpcb_create() to pass the nodename from the
> > parent. The point is that the rpc_clnt->cl_nodename is used in various
> > different contexts (for instance in the AUTH_SYS credential) where it
> > isn't always appropriate to have it set to an empty string.
>
> I was rather hoping that Bruno would fix up his patch and resend, but
> since other reports of the same bug are now surfacing... Please could
> you all check if something like the following patch fixes it.

This patch works for me, so
Tested-by: Bruno Prémont <[email protected]>

Now I get just the following complaint in dmesg on shutdown:
[ 1940.173201] lockd: cannot unmonitor nfs.home
^^^^^^^^ name of NFS server

This complaint did not happen with my "empty string" name
patch.

Thanks,
Bruno


> Thanks
> Trond
>
> 8<---------------------------------------------------------------------
> From 87557df0ca2241da0edac558286650fdb081152c Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Fri, 30 Jan 2015 18:12:28 -0500
> Subject: [PATCH] SUNRPC: NULL utsname dereference on NFS umount during
> namespace cleanup
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Fix an Oopsable condition when nsm_mon_unmon is called as part of the
> namespace cleanup, which now apparently happens after the utsname
> has been freed.
>
> Link: http://lkml.kernel.org/r/[email protected]
> Reported-by: Bruno Prémont <[email protected]>
> Cc: [email protected] # 3.18
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/lockd/mon.c | 13 +++++++++----
> include/linux/sunrpc/clnt.h | 3 ++-
> net/sunrpc/clnt.c | 12 +++++++-----
> net/sunrpc/rpcb_clnt.c | 8 ++++++--
> 4 files changed, 24 insertions(+), 12 deletions(-)
>
> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
> index 1cc6ec51e6b1..47a32b6d9b90 100644
> --- a/fs/lockd/mon.c
> +++ b/fs/lockd/mon.c
> @@ -65,7 +65,7 @@ static inline struct sockaddr *nsm_addr(const struct nsm_handle *nsm)
> return (struct sockaddr *)&nsm->sm_addr;
> }
>
> -static struct rpc_clnt *nsm_create(struct net *net)
> +static struct rpc_clnt *nsm_create(struct net *net, const char *nodename)
> {
> struct sockaddr_in sin = {
> .sin_family = AF_INET,
> @@ -77,6 +77,7 @@ static struct rpc_clnt *nsm_create(struct net *net)
> .address = (struct sockaddr *)&sin,
> .addrsize = sizeof(sin),
> .servername = "rpc.statd",
> + .nodename = nodename,
> .program = &nsm_program,
> .version = NSM_VERSION,
> .authflavor = RPC_AUTH_NULL,
> @@ -102,7 +103,7 @@ out:
> return clnt;
> }
>
> -static struct rpc_clnt *nsm_client_get(struct net *net)
> +static struct rpc_clnt *nsm_client_get(struct net *net, const char *nodename)
> {
> struct rpc_clnt *clnt, *new;
> struct lockd_net *ln = net_generic(net, lockd_net_id);
> @@ -111,7 +112,7 @@ static struct rpc_clnt *nsm_client_get(struct net *net)
> if (clnt != NULL)
> goto out;
>
> - clnt = new = nsm_create(net);
> + clnt = new = nsm_create(net, nodename);
> if (IS_ERR(clnt))
> goto out;
>
> @@ -190,19 +191,23 @@ int nsm_monitor(const struct nlm_host *host)
> struct nsm_res res;
> int status;
> struct rpc_clnt *clnt;
> + const char *nodename = NULL;
>
> dprintk("lockd: nsm_monitor(%s)\n", nsm->sm_name);
>
> if (nsm->sm_monitored)
> return 0;
>
> + if (host->h_rpcclnt)
> + nodename = host->h_rpcclnt->cl_nodename;
> +
> /*
> * Choose whether to record the caller_name or IP address of
> * this peer in the local rpc.statd's database.
> */
> nsm->sm_mon_name = nsm_use_hostnames ? nsm->sm_name : nsm->sm_addrbuf;
>
> - clnt = nsm_client_get(host->net);
> + clnt = nsm_client_get(host->net, nodename);
> if (IS_ERR(clnt)) {
> status = PTR_ERR(clnt);
> dprintk("lockd: failed to create NSM upcall transport, "
> diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
> index d86acc63b25f..598ba80ec30c 100644
> --- a/include/linux/sunrpc/clnt.h
> +++ b/include/linux/sunrpc/clnt.h
> @@ -57,7 +57,7 @@ struct rpc_clnt {
> const struct rpc_timeout *cl_timeout; /* Timeout strategy */
>
> int cl_nodelen; /* nodename length */
> - char cl_nodename[UNX_MAXNODENAME];
> + char cl_nodename[UNX_MAXNODENAME+1];
> struct rpc_pipe_dir_head cl_pipedir_objects;
> struct rpc_clnt * cl_parent; /* Points to parent of clones */
> struct rpc_rtt cl_rtt_default;
> @@ -112,6 +112,7 @@ struct rpc_create_args {
> struct sockaddr *saddress;
> const struct rpc_timeout *timeout;
> const char *servername;
> + const char *nodename;
> const struct rpc_program *program;
> u32 prognumber; /* overrides program->number */
> u32 version;
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 05da12a33945..3f5d4d48f0cb 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -286,10 +286,8 @@ static struct rpc_xprt *rpc_clnt_set_transport(struct rpc_clnt *clnt,
>
> static void rpc_clnt_set_nodename(struct rpc_clnt *clnt, const char *nodename)
> {
> - clnt->cl_nodelen = strlen(nodename);
> - if (clnt->cl_nodelen > UNX_MAXNODENAME)
> - clnt->cl_nodelen = UNX_MAXNODENAME;
> - memcpy(clnt->cl_nodename, nodename, clnt->cl_nodelen);
> + clnt->cl_nodelen = strlcpy(clnt->cl_nodename,
> + nodename, sizeof(clnt->cl_nodename));
> }
>
> static int rpc_client_register(struct rpc_clnt *clnt,
> @@ -365,6 +363,7 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,
> const struct rpc_version *version;
> struct rpc_clnt *clnt = NULL;
> const struct rpc_timeout *timeout;
> + const char *nodename = args->nodename;
> int err;
>
> /* sanity check the name before trying to print it */
> @@ -420,8 +419,10 @@ static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,
>
> atomic_set(&clnt->cl_count, 1);
>
> + if (nodename == NULL)
> + nodename = utsname()->nodename;
> /* save the nodename */
> - rpc_clnt_set_nodename(clnt, utsname()->nodename);
> + rpc_clnt_set_nodename(clnt, nodename);
>
> err = rpc_client_register(clnt, args->authflavor, args->client_name);
> if (err)
> @@ -576,6 +577,7 @@ static struct rpc_clnt *__rpc_clone_client(struct rpc_create_args *args,
> if (xprt == NULL)
> goto out_err;
> args->servername = xprt->servername;
> + args->nodename = clnt->cl_nodename;
>
> new = rpc_new_client(args, xprt, clnt);
> if (IS_ERR(new)) {
> diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
> index 05202012bcfc..cf5770d8f49a 100644
> --- a/net/sunrpc/rpcb_clnt.c
> +++ b/net/sunrpc/rpcb_clnt.c
> @@ -355,7 +355,8 @@ out:
> return result;
> }
>
> -static struct rpc_clnt *rpcb_create(struct net *net, const char *hostname,
> +static struct rpc_clnt *rpcb_create(struct net *net, const char *nodename,
> + const char *hostname,
> struct sockaddr *srvaddr, size_t salen,
> int proto, u32 version)
> {
> @@ -365,6 +366,7 @@ static struct rpc_clnt *rpcb_create(struct net *net, const char *hostname,
> .address = srvaddr,
> .addrsize = salen,
> .servername = hostname,
> + .nodename = nodename,
> .program = &rpcb_program,
> .version = version,
> .authflavor = RPC_AUTH_UNIX,
> @@ -740,7 +742,9 @@ void rpcb_getport_async(struct rpc_task *task)
> dprintk("RPC: %5u %s: trying rpcbind version %u\n",
> task->tk_pid, __func__, bind_version);
>
> - rpcb_clnt = rpcb_create(xprt->xprt_net, xprt->servername, sap, salen,
> + rpcb_clnt = rpcb_create(xprt->xprt_net,
> + clnt->cl_nodename,
> + xprt->servername, sap, salen,
> xprt->prot, bind_version);
> if (IS_ERR(rpcb_clnt)) {
> status = PTR_ERR(rpcb_clnt);

2015-02-04 19:06:49

by Trond Myklebust

[permalink] [raw]
Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On Wed, Feb 4, 2015 at 12:08 PM, Bruno Prémont
<[email protected]> wrote:
>
> On Fri, 30 January 2015 Trond Myklebust wrote:
> > On Sun, 2015-01-25 at 16:55 -0500, Trond Myklebust wrote:
> > > On Sun, Jan 25, 2015 at 4:06 PM, Bruno Prémont wrote:
> > > > On a system running home-brown container (mntns, utsns, pidns, netns)
> > > > with NFS mount-point bind-mounted into the container I hit the following
> > > > trace if nfs filesystem is first umount()ed in init ns and then later
> > > > umounted from container when the container exists.
> > > >
> > > > [51397.767310] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > > > [51397.770671] IP: [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> > > > [51397.773967] PGD 0
> > > > [51397.777218] Oops: 0000 [#1] SMP
> > > > [51397.780490] Modules linked in:
> > > > [51397.783751] CPU: 0 PID: 1711 Comm: image-starter Not tainted 3.19.0-rc2-kvm+ #7
> > > > [51397.787123] Hardware name: Gigabyte Technology Co., Ltd. GA-A75M-UD2H/GA-A75M-UD2H, BIOS F6 09/28/2012
> > > > [51397.790606] task: ffff8800c9fcbac0 ti: ffff8801fe754000 task.ti: ffff8801fe754000
> > > > [51397.794149] RIP: 0010:[<ffffffff81828173>] [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> > > > [51397.797798] RSP: 0018:ffff8801fe757908 EFLAGS: 00010246
> > > > [51397.801444] RAX: 0000000000000000 RBX: ffff88009dafb240 RCX: 0000000000000180
> > > > [51397.805174] RDX: 000000000000bae0 RSI: 0000000000001770 RDI: ffff88009dafb308
> > > > [51397.808913] RBP: ffff8801fe757948 R08: ffff88009daf92d8 R09: ffff88009dafb458
> > > > [51397.812673] R10: ffff88009dafb458 R11: ffff88020ec15bc0 R12: ffff8801fe757a40
> > > > [51397.816456] R13: ffffffff81b9d800 R14: ffff8800c6e31030 R15: 0000000000000000
> > > > [51397.820270] FS: 00007f335a3a1700(0000) GS:ffff88020ec00000(0000) knlGS:00000000f7287700
> > > > [51397.824168] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > [51397.828066] CR2: 0000000000000008 CR3: 00000001fc54d000 CR4: 00000000000007f0
> > > > [51397.832017] Stack:
> > > > [51397.835924] 000000000000000a ffffffff81b9d770 ffffffff81826450 ffff8801fe757a40
> > > > [51397.840023] ffff8801fe757a40 ffff8800cf08d500 ffffffff81826450 ffffffff820f4728
> > > > [51397.844130] ffff8801fe757978 ffffffff81828815 ffff8801fe757978 ffffffff8182aad8
> > > > [51397.848224] Call Trace:
> > > > [51397.852221] [<ffffffff81826450>] ? call_start+0x20/0x20
> > > > [51397.856273] [<ffffffff81826450>] ? call_start+0x20/0x20
> > > > [51397.860295] [<ffffffff81828815>] rpc_create_xprt+0x15/0xb0
> > > > [51397.864324] [<ffffffff8182aad8>] ? xprt_create_transport+0x108/0x1b0
> > > > [51397.868428] [<ffffffff81828971>] rpc_create+0xc1/0x190
> > > > [51397.872574] [<ffffffff81111c86>] ? internal_add_timer+0x66/0x80
> > > > [51397.876733] [<ffffffff81113a99>] ? mod_timer+0x109/0x1e0
> > > > [51397.880877] [<ffffffff8183a19e>] rpcb_create+0x6e/0x90
> > > > [51397.884999] [<ffffffff8183a71a>] rpcb_getport_async+0x15a/0x330
> > > > [51397.889118] [<ffffffff8182f1da>] ? rpc_malloc+0x3a/0x70
> > > > [51397.893240] [<ffffffff811af8d2>] ? __kmalloc+0xc2/0x170
> > > > [51397.897354] [<ffffffff81826830>] ? call_reserveresult+0x110/0x110
> > > > [51397.901490] [<ffffffff81826450>] ? call_start+0x20/0x20
> > > > [51397.905606] [<ffffffff81826450>] ? call_start+0x20/0x20
> > > > [51397.909662] [<ffffffff8182648e>] call_bind+0x3e/0x40
> > > > [51397.913709] [<ffffffff8182fa99>] __rpc_execute+0x79/0x330
> > > > [51397.917778] [<ffffffff818327bd>] rpc_execute+0x5d/0xa0
> > > > [51397.921871] [<ffffffff818286cb>] rpc_run_task+0x6b/0x90
> > > > [51397.925989] [<ffffffff8182872e>] rpc_call_sync+0x3e/0xa0
> > > > [51397.930108] [<ffffffff8127fe29>] nsm_mon_unmon+0xb9/0xd0
> > > > [51397.934191] [<ffffffff8110e2a0>] ? call_rcu_bh+0x20/0x20
> > > > [51397.938235] [<ffffffff8128018c>] nsm_unmonitor+0x8c/0x140
> > > > [51397.942309] [<ffffffff8127bc43>] nlm_destroy_host_locked+0x63/0xa0
> > > > [51397.946442] [<ffffffff8127c03c>] nlmclnt_release_host+0x7c/0x130
> > > > [51397.950591] [<ffffffff81279645>] nlmclnt_done+0x15/0x30
> > > > [51397.954773] [<ffffffff81241862>] nfs_destroy_server+0x12/0x20
> > > > [51397.958934] [<ffffffff81242372>] nfs_free_server+0x22/0xa0
> > > > [51397.963053] [<ffffffff8124cadd>] nfs_kill_super+0x1d/0x30
> > > > [51397.967158] [<ffffffff811c2e2c>] deactivate_locked_super+0x4c/0x70
> > > > [51397.971286] [<ffffffff811c33f9>] deactivate_super+0x49/0x70
> > > > [51397.975398] [<ffffffff811ddafe>] cleanup_mnt+0x3e/0x90
> > > > [51397.979499] [<ffffffff811ddb9d>] __cleanup_mnt+0xd/0x10
> > > > [51397.983598] [<ffffffff810e04cc>] task_work_run+0xbc/0xe0
> > > > [51397.987697] [<ffffffff810c8f95>] do_exit+0x295/0xaf0
> > > > [51397.991812] [<ffffffff811c2239>] ? ____fput+0x9/0x10
> > > > [51397.995937] [<ffffffff810e04b4>] ? task_work_run+0xa4/0xe0
> > > > [51398.000070] [<ffffffff810c986a>] do_group_exit+0x3a/0xa0
> > > > [51398.004201] [<ffffffff810c98df>] SyS_exit_group+0xf/0x10
> > > > [51398.008315] [<ffffffff8185e8d2>] system_call_fastpath+0x12/0x17
> > > > [51398.012438] Code: 43 78 48 8d bb c8 00 00 00 48 89 7b 70 48 8b 30 e8 63 2d 01 00 c7 03 01 00 00 00 65 48 8b 04 25 00 aa 00 00 48 8b 80 c0 09 00 00 <4c> 8b 68 08 49 83 c5 45 4
> > > > [51398.022378] RIP [<ffffffff81828173>] rpc_new_client+0x193/0x2b0
> > > > [51398.026732] RSP <ffff8801fe757908>
> > > > [51398.031025] CR2: 0000000000000008
> > > > [51398.035326] ---[ end trace b701b037bc457620 ]---
> > > > [51398.058223] Fixing recursive fault but reboot is needed!
> > >
> > > We should rather change rpcb_create() to pass the nodename from the
> > > parent. The point is that the rpc_clnt->cl_nodename is used in various
> > > different contexts (for instance in the AUTH_SYS credential) where it
> > > isn't always appropriate to have it set to an empty string.
> >
> > I was rather hoping that Bruno would fix up his patch and resend, but
> > since other reports of the same bug are now surfacing... Please could
> > you all check if something like the following patch fixes it.
>
> This patch works for me, so
> Tested-by: Bruno Prémont <[email protected]>
>
> Now I get just the following complaint in dmesg on shutdown:
> [ 1940.173201] lockd: cannot unmonitor nfs.home
> ^^^^^^^^ name of NFS server
>
> This complaint did not happen with my "empty string" name
> patch.

Are there any clues from rpc.statd in your syslog that might help to
explain the error?

--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2015-02-04 21:52:35

by Bruno Prémont

[permalink] [raw]
Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On Wed, 4 Feb 2015 14:06:48 Trond Myklebust wrote:
> On Wed, Feb 4, 2015 at 12:08 PM, Bruno Prémont wrote:
> > On Fri, 30 January 2015 Trond Myklebust wrote:
> > > On Sun, 2015-01-25 at 16:55 -0500, Trond Myklebust wrote:
> > > > On Sun, Jan 25, 2015 at 4:06 PM, Bruno Prémont wrote:
> > > > > On a system running home-brown container (mntns, utsns,
> > > > > pidns, netns) with NFS mount-point bind-mounted into the
> > > > > container I hit the following trace if nfs filesystem is
> > > > > first umount()ed in init ns and then later umounted from
> > > > > container when the container exists.
> > > > >
> > > > > <snip trace>
> > > >
> > > > We should rather change rpcb_create() to pass the nodename from
> > > > the parent. The point is that the rpc_clnt->cl_nodename is used
> > > > in various different contexts (for instance in the AUTH_SYS
> > > > credential) where it isn't always appropriate to have it set to
> > > > an empty string.
> > >
> > > I was rather hoping that Bruno would fix up his patch and resend,
> > > but since other reports of the same bug are now surfacing...
> > > Please could you all check if something like the following patch
> > > fixes it.
> >
> > This patch works for me, so
> > Tested-by: Bruno Prémont <[email protected]>
> >
> > Now I get just the following complaint in dmesg on shutdown:
> > [ 1940.173201] lockd: cannot unmonitor nfs.home
> > ^^^^^^^^ name of NFS
> > server
> >
> > This complaint did not happen with my "empty string" name
> > patch.
>
> Are there any clues from rpc.statd in your syslog that might help to
> explain the error?

I would say there is no more running rpc.statd at that time.

My shutdown process looks about as follows (it's not necessarily the
optimal ordering):

- umount nfs mountpoints from root mntns
- stop nfsclient
- stop rpc.statd
- stop rpcbind
- stop containers (with bind-mounted nfs mountpoints from root mntns
that get implicitly umounted on mntns release)

Bruno

2015-02-04 22:29:53

by Trond Myklebust

[permalink] [raw]
Subject: Re: [Patch] sunrpc: NULL utsname dereference on NFS umount during namespace cleanup

On Wed, Feb 4, 2015 at 4:52 PM, Bruno Prémont <[email protected]> wrote:
> On Wed, 4 Feb 2015 14:06:48 Trond Myklebust wrote:
>> On Wed, Feb 4, 2015 at 12:08 PM, Bruno Prémont wrote:
>> > On Fri, 30 January 2015 Trond Myklebust wrote:
>> > > On Sun, 2015-01-25 at 16:55 -0500, Trond Myklebust wrote:
>> > > > On Sun, Jan 25, 2015 at 4:06 PM, Bruno Prémont wrote:
>> > > > > On a system running home-brown container (mntns, utsns,
>> > > > > pidns, netns) with NFS mount-point bind-mounted into the
>> > > > > container I hit the following trace if nfs filesystem is
>> > > > > first umount()ed in init ns and then later umounted from
>> > > > > container when the container exists.
>> > > > >
>> > > > > <snip trace>
>> > > >
>> > > > We should rather change rpcb_create() to pass the nodename from
>> > > > the parent. The point is that the rpc_clnt->cl_nodename is used
>> > > > in various different contexts (for instance in the AUTH_SYS
>> > > > credential) where it isn't always appropriate to have it set to
>> > > > an empty string.
>> > >
>> > > I was rather hoping that Bruno would fix up his patch and resend,
>> > > but since other reports of the same bug are now surfacing...
>> > > Please could you all check if something like the following patch
>> > > fixes it.
>> >
>> > This patch works for me, so
>> > Tested-by: Bruno Prémont <[email protected]>
>> >
>> > Now I get just the following complaint in dmesg on shutdown:
>> > [ 1940.173201] lockd: cannot unmonitor nfs.home
>> > ^^^^^^^^ name of NFS
>> > server
>> >
>> > This complaint did not happen with my "empty string" name
>> > patch.
>>
>> Are there any clues from rpc.statd in your syslog that might help to
>> explain the error?
>
> I would say there is no more running rpc.statd at that time.
>
> My shutdown process looks about as follows (it's not necessarily the
> optimal ordering):
>
> - umount nfs mountpoints from root mntns
> - stop nfsclient
> - stop rpc.statd
> - stop rpcbind
> - stop containers (with bind-mounted nfs mountpoints from root mntns
> that get implicitly umounted on mntns release)
>

In that case, the above message can be taken as a likely good
indication that the kernel is working as expected. If there is no
rpc.statd to talk to, then the unmonitor RPC call is expected to fail.

--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]