2016-10-25 21:50:03

by Yotam Gigi

[permalink] [raw]
Subject: RE: nfs NULL-dereferencing in net-next


>-----Original Message-----
>From: [email protected] [mailto:[email protected]] O=
n
>Behalf Of Jakub Kicinski
>Sent: Monday, October 17, 2016 10:20 PM
>To: Andy Adamson <[email protected]>; Anna Schumaker
><[email protected]>; [email protected]
>Cc: [email protected]; Trond Myklebust <[email protected]>
>Subject: nfs NULL-dereferencing in net-next
>
>Hi!
>
>I'm hitting this reliably on net-next, HEAD at 3f3177bb680f
>("fsl/fman: fix error return code in mac_probe()").


I see the same thing. It happens constantly on some of my machines, making =
them
completely unusable.

I bisected it and got to the commit:

commit 04ea1b3e6d8ed4978bb608c1748530af3de8c274
Author: Andy Adamson <[email protected]>
Date: Fri Sep 9 09:22:27 2016 -0400

NFS add xprt switch addrs test to match client
=20
Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>


>
>[ 23.409633] BUG: unable to handle kernel NULL pointer dereference at
>0000000000000172
>[ 23.418716] IP: [<ffffffffc041776c>] rpc_clnt_xprt_switch_has_addr+0xc/=
0x40
>[sunrpc]
>[ 23.427574] PGD 859020067 [ 23.430472] PUD 858f2d067
>PMD 0 [ 23.434311]
>[ 23.436133] Oops: 0000 [#1] PREEMPT SMP
>[ 23.440506] Modules linked in: nfsv4 ip6table_filter ip6_tables iptable=
_filter
>ip_tables ebtable_nat ebtables x_tables intel_ri
>[ 23.505915] CPU: 1 PID: 1067 Comm: mount.nfs Not tainted 4.8.0-perf-139=
51-
>g3f3177bb680f #51
>[ 23.515363] Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 1.2.10
>03/10/2015
>[ 23.523937] task: ffff983e9086ea00 task.stack: ffffac6c0a57c000
>[ 23.530641] RIP: 0010:[<ffffffffc041776c>] [<ffffffffc041776c>]
>rpc_clnt_xprt_switch_has_addr+0xc/0x40 [sunrpc]
>[ 23.542229] RSP: 0018:ffffac6c0a57fb28 EFLAGS: 00010a97
>[ 23.548255] RAX: 00000000c80214ac RBX: ffff983e97c7b000 RCX: ffff983e9b=
3bc180
>[ 23.556320] RDX: 0000000000000001 RSI: ffff983e9928ed28 RDI: ffffffffff=
ffffea
>[ 23.564386] RBP: ffffac6c0a57fb38 R08: ffff983e97090630 R09: ffff983e99=
28ed30
>[ 23.572452] R10: ffffac6c0a57fba0 R11: 0000000000000010 R12: ffffac6c0a=
57fba0
>[ 23.580517] R13: ffff983e9928ed28 R14: 0000000000000000 R15: ffff983e91=
360560
>[ 23.588585] FS: 00007f4c348aa880(0000) GS:ffff983e9f240000(0000)
>knlGS:0000000000000000
>[ 23.597742] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[ 23.604251] CR2: 0000000000000172 CR3: 0000000850a5f000 CR4:
>00000000001406e0
>[ 23.612316] Stack:
>[ 23.614648] ffff983e97c7b000 ffffac6c0a57fba0 ffffac6c0a57fb90 fffffff=
fc04d38c3
>[ 23.623331] ffff983e91360500 ffff983e9928ed30 ffffffffc0b9e560
>ffff983e913605b8
>[ 23.632016] ffff983e9882e800 ffff983e9882e800 ffffac6c0a57fc30 ffffac6=
c0a57fdb8
>[ 23.640706] Call Trace:
>[ 23.643535] [<ffffffffc04d38c3>] nfs_get_client+0x123/0x340 [nfs]
>[ 23.650542] [<ffffffffc0b8f070>] nfs4_set_client+0x80/0xb0 [nfsv4]
>[ 23.657642] [<ffffffffc0b90305>] nfs4_create_server+0x115/0x2a0 [nfsv4=
]
>[ 23.665230] [<ffffffffc0b888ce>] nfs4_remote_mount+0x2e/0x60 [nfsv4]
>[ 23.672519] [<ffffffffba1e590a>] mount_fs+0x3a/0x160
>[ 23.678254] [<ffffffffba201a5e>] ? alloc_vfsmnt+0x19e/0x230
>[ 23.684669] [<ffffffffba201b57>] vfs_kern_mount+0x67/0x110
>[ 23.690990] [<ffffffffc0b887f4>] nfs_do_root_mount+0x84/0xc0 [nfsv4]
>[ 23.698284] [<ffffffffc0b88b97>] nfs4_try_mount+0x37/0x50 [nfsv4]
>[ 23.705287] [<ffffffffc04dfbd1>] nfs_fs_mount+0x2d1/0xa70 [nfs]
>[ 23.712092] [<ffffffffba3a6228>] ? find_next_bit+0x18/0x20
>[ 23.718413] [<ffffffffc04deac0>] ? nfs_remount+0x3c0/0x3c0 [nfs]
>[ 23.725316] [<ffffffffc04dedb0>] ? nfs_clone_super+0x130/0x130 [nfs]
>[ 23.732606] [<ffffffffba1e590a>] mount_fs+0x3a/0x160
>[ 23.738340] [<ffffffffba201a5e>] ? alloc_vfsmnt+0x19e/0x230
>[ 23.744755] [<ffffffffba201b57>] vfs_kern_mount+0x67/0x110
>[ 23.751071] [<ffffffffba2041df>] do_mount+0x1bf/0xc70
>[ 23.756904] [<ffffffffba203e9b>] ? copy_mount_options+0xbb/0x220
>[ 23.763803] [<ffffffffba204fa3>] SyS_mount+0x83/0xd0
>[ 23.769538] [<ffffffffba6f1ea4>] entry_SYSCALL_64_fastpath+0x17/0x98
>[ 23.776817] Code: 01 00 48 8b 93 f8 04 00 00 44 89 e6 48 c7 c7 98 b2 43=
c0 e8 9f 0d d4
>f9 eb c0 0f 1f 44 00 00 0f 1f 44 00 00
>[ 23.802909] RIP [<ffffffffc041776c>] rpc_clnt_xprt_switch_has_addr+0xc=
/0x40
>[sunrpc]
>[ 23.811857] RSP <ffffac6c0a57fb28>
>[ 23.815839] CR2: 0000000000000172
>[ 23.819629] ---[ end trace 9958eca92c9eeafe ]---
>[ 23.827345] note: mount.nfs[1067] exited with preempt_count 1


2016-10-26 14:40:36

by Anna Schumaker

[permalink] [raw]
Subject: Re: nfs NULL-dereferencing in net-next

On 10/25/2016 01:19 PM, Yotam Gigi wrote:
>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On
>> Behalf Of Jakub Kicinski
>> Sent: Monday, October 17, 2016 10:20 PM
>> To: Andy Adamson <[email protected]>; Anna Schumaker
>> <[email protected]>; [email protected]
>> Cc: [email protected]; Trond Myklebust <[email protected]>
>> Subject: nfs NULL-dereferencing in net-next
>>
>> Hi!
>>
>> I'm hitting this reliably on net-next, HEAD at 3f3177bb680f
>> ("fsl/fman: fix error return code in mac_probe()").
>
>
> I see the same thing. It happens constantly on some of my machines, making them
> completely unusable.
>
> I bisected it and got to the commit:
>
> commit 04ea1b3e6d8ed4978bb608c1748530af3de8c274
> Author: Andy Adamson <[email protected]>
> Date: Fri Sep 9 09:22:27 2016 -0400
>
> NFS add xprt switch addrs test to match client
>
> Signed-off-by: Andy Adamson <[email protected]>
> Signed-off-by: Anna Schumaker <[email protected]>

Thanks for reporting on this everyone! Does this patch help?

>From 96376ca1dd4077a1d341bdcb9cc86426ee3844f1 Mon Sep 17 00:00:00 2001
From: Anna Schumaker <[email protected]>
Date: Wed, 26 Oct 2016 10:33:31 -0400
Subject: [PATCH] SUNRPC: Fix suspicious RCU usage

We need to hold the rcu_read_lock() when calling rcu_dereference(),
otherwise we can't guarantee that the object being dereferenced still
exists.

Signed-off-by: Anna Schumaker <[email protected]>
---
net/sunrpc/clnt.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 34dd7b2..62a4827 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2753,14 +2753,18 @@ EXPORT_SYMBOL_GPL(rpc_cap_max_reconnect_timeout);

void rpc_clnt_xprt_switch_put(struct rpc_clnt *clnt)
{
+ rcu_read_lock();
xprt_switch_put(rcu_dereference(clnt->cl_xpi.xpi_xpswitch));
+ rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(rpc_clnt_xprt_switch_put);

void rpc_clnt_xprt_switch_add_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt)
{
+ rcu_read_lock();
rpc_xprt_switch_add_xprt(rcu_dereference(clnt->cl_xpi.xpi_xpswitch),
xprt);
+ rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(rpc_clnt_xprt_switch_add_xprt);

@@ -2770,9 +2774,8 @@ bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt,
struct rpc_xprt_switch *xps;
bool ret;

- xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch);
-
rcu_read_lock();
+ xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch);
ret = rpc_xprt_switch_has_addr(xps, sap);
rcu_read_unlock();
return ret;
--
2.10.1

>
>
>>
>> [ 23.409633] BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000172
>> [ 23.418716] IP: [<ffffffffc041776c>] rpc_clnt_xprt_switch_has_addr+0xc/0x40
>> [sunrpc]
>> [ 23.427574] PGD 859020067 [ 23.430472] PUD 858f2d067
>> PMD 0 [ 23.434311]
>> [ 23.436133] Oops: 0000 [#1] PREEMPT SMP
>> [ 23.440506] Modules linked in: nfsv4 ip6table_filter ip6_tables iptable_filter
>> ip_tables ebtable_nat ebtables x_tables intel_ri
>> [ 23.505915] CPU: 1 PID: 1067 Comm: mount.nfs Not tainted 4.8.0-perf-13951-
>> g3f3177bb680f #51
>> [ 23.515363] Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 1.2.10
>> 03/10/2015
>> [ 23.523937] task: ffff983e9086ea00 task.stack: ffffac6c0a57c000
>> [ 23.530641] RIP: 0010:[<ffffffffc041776c>] [<ffffffffc041776c>]
>> rpc_clnt_xprt_switch_has_addr+0xc/0x40 [sunrpc]
>> [ 23.542229] RSP: 0018:ffffac6c0a57fb28 EFLAGS: 00010a97
>> [ 23.548255] RAX: 00000000c80214ac RBX: ffff983e97c7b000 RCX: ffff983e9b3bc180
>> [ 23.556320] RDX: 0000000000000001 RSI: ffff983e9928ed28 RDI: ffffffffffffffea
>> [ 23.564386] RBP: ffffac6c0a57fb38 R08: ffff983e97090630 R09: ffff983e9928ed30
>> [ 23.572452] R10: ffffac6c0a57fba0 R11: 0000000000000010 R12: ffffac6c0a57fba0
>> [ 23.580517] R13: ffff983e9928ed28 R14: 0000000000000000 R15: ffff983e91360560
>> [ 23.588585] FS: 00007f4c348aa880(0000) GS:ffff983e9f240000(0000)
>> knlGS:0000000000000000
>> [ 23.597742] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 23.604251] CR2: 0000000000000172 CR3: 0000000850a5f000 CR4:
>> 00000000001406e0
>> [ 23.612316] Stack:
>> [ 23.614648] ffff983e97c7b000 ffffac6c0a57fba0 ffffac6c0a57fb90 ffffffffc04d38c3
>> [ 23.623331] ffff983e91360500 ffff983e9928ed30 ffffffffc0b9e560
>> ffff983e913605b8
>> [ 23.632016] ffff983e9882e800 ffff983e9882e800 ffffac6c0a57fc30 ffffac6c0a57fdb8
>> [ 23.640706] Call Trace:
>> [ 23.643535] [<ffffffffc04d38c3>] nfs_get_client+0x123/0x340 [nfs]
>> [ 23.650542] [<ffffffffc0b8f070>] nfs4_set_client+0x80/0xb0 [nfsv4]
>> [ 23.657642] [<ffffffffc0b90305>] nfs4_create_server+0x115/0x2a0 [nfsv4]
>> [ 23.665230] [<ffffffffc0b888ce>] nfs4_remote_mount+0x2e/0x60 [nfsv4]
>> [ 23.672519] [<ffffffffba1e590a>] mount_fs+0x3a/0x160
>> [ 23.678254] [<ffffffffba201a5e>] ? alloc_vfsmnt+0x19e/0x230
>> [ 23.684669] [<ffffffffba201b57>] vfs_kern_mount+0x67/0x110
>> [ 23.690990] [<ffffffffc0b887f4>] nfs_do_root_mount+0x84/0xc0 [nfsv4]
>> [ 23.698284] [<ffffffffc0b88b97>] nfs4_try_mount+0x37/0x50 [nfsv4]
>> [ 23.705287] [<ffffffffc04dfbd1>] nfs_fs_mount+0x2d1/0xa70 [nfs]
>> [ 23.712092] [<ffffffffba3a6228>] ? find_next_bit+0x18/0x20
>> [ 23.718413] [<ffffffffc04deac0>] ? nfs_remount+0x3c0/0x3c0 [nfs]
>> [ 23.725316] [<ffffffffc04dedb0>] ? nfs_clone_super+0x130/0x130 [nfs]
>> [ 23.732606] [<ffffffffba1e590a>] mount_fs+0x3a/0x160
>> [ 23.738340] [<ffffffffba201a5e>] ? alloc_vfsmnt+0x19e/0x230
>> [ 23.744755] [<ffffffffba201b57>] vfs_kern_mount+0x67/0x110
>> [ 23.751071] [<ffffffffba2041df>] do_mount+0x1bf/0xc70
>> [ 23.756904] [<ffffffffba203e9b>] ? copy_mount_options+0xbb/0x220
>> [ 23.763803] [<ffffffffba204fa3>] SyS_mount+0x83/0xd0
>> [ 23.769538] [<ffffffffba6f1ea4>] entry_SYSCALL_64_fastpath+0x17/0x98
>> [ 23.776817] Code: 01 00 48 8b 93 f8 04 00 00 44 89 e6 48 c7 c7 98 b2 43 c0 e8 9f 0d d4
>> f9 eb c0 0f 1f 44 00 00 0f 1f 44 00 00
>> [ 23.802909] RIP [<ffffffffc041776c>] rpc_clnt_xprt_switch_has_addr+0xc/0x40
>> [sunrpc]
>> [ 23.811857] RSP <ffffac6c0a57fb28>
>> [ 23.815839] CR2: 0000000000000172
>> [ 23.819629] ---[ end trace 9958eca92c9eeafe ]---
>> [ 23.827345] note: mount.nfs[1067] exited with preempt_count 1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>