2019-04-21 04:16:31

by Doug Nazar

[permalink] [raw]
Subject: nfsd bug after f875a792abe9 (nfsd: allow nfsv3 readdir request to be larger.)

After recently upgrading the kernel on one of my servers I was getting
the below crashes. Tracked it down to an NFSv3 root client, bisection
points to f875a792abe9. A revert of that on top of
5.1.0-rc5-00289-gee2725f711fa works fine.

Relevant exports:
/storage2/cubox/root
192.168.21.0/24(sec=sys,rw,no_root_squash,insecure,nohide,no_subtree_check,crossmnt,async)
/
*(sec=krb5,rw,fsid=0,no_root_squash,insecure,no_subtree_check,crossmnt,async)
/storage2
*(sec=krb5,rw,no_root_squash,insecure,nohide,no_subtree_check,crossmnt,async)


Client has the following mounts to this server (running a modified 4.9.124):
192.168.21.90:/storage2/cubox/root on / type nfs
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.21.90,mountvers=3,mountproto=tcp,local_lock=all,addr=192.168.21.90)
192.168.21.90:/storage2/cubox/root/var/tmp on /var/tmp type nfs4
(rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=192.168.21.242,local_lock=none,addr=192.168.21.90)


Thanks,
Doug


Initial BUG with HEAD when I started testing.
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294654] BUG: unable to
handle kernel NULL pointer dereference at 0000000000000008
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294772] #PF error: [normal
kernel read fault]
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294776] PGD 0 P4D 0
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294780] Oops: 0000 [#1] SMP
NOPTI
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294784] CPU: 1 PID: 5459
Comm: nfsd Not tainted 5.1.0-rc5-00289-gee2725f711fa #67
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294787] Hardware name:
Gigabyte Technology Co., Ltd. GA-MA78LMT-S2/GA-MA78LMT-S2, BIOS F14
08/16/2011
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294794] RIP:
0010:do_tcp_sendpages+0x308/0x5d0
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294799] Code: ff 48 63 e8 48
8b 5c 24 40 65 48 33 1c 25 28 00 00 00 48 89 e8 0f 85 73 02 00 00 48 83
c4 48 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <49> 8b 57 08 48 8d 42 ff 83 e2
01 49 0f 44 c7 f0 ff 40 34 8b 91 b8
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294804] RSP:
0018:ffffc90000437c90 EFLAGS: 00010202
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294808] RAX:
0000000000000001 RBX: 0000000000028000 RCX: ffff888210505600
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294811] RDX:
ffffffff8154b1a0 RSI: 0000000000001000 RDI: ffff888210316e80
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294814] RBP:
0000000000000000 R08: 0000000000001000 R09: 0000000000000003
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294817] R10:
0000000000000002 R11: 0000000000000005 R12: 0000000000001000
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294820] R13:
00000000000005a8 R14: ffff888210316e80 R15: 0000000000000000
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294823] FS:
0000000000000000(0000) GS:ffff888217a40000(0000) knlGS:0000000000000000
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294826] CS:  0010 DS: 0000
ES: 0000 CR0: 0000000080050033
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294829] CR2:
0000000000000008 CR3: 00000002150aa000 CR4: 00000000000006e0
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294832] Call Trace:
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294837]
tcp_sendpage_locked+0x42/0x60
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294841] tcp_sendpage+0x38/0x50
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294844] inet_sendpage+0x3a/0xe0
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294856]  ?
encode_post_op_attr.isra.4+0xf4/0x100 [nfsd]
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294865]
kernel_sendpage+0x15/0x30
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294878]
svc_send_common+0xed/0x150 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294887]
svc_sendto+0xf6/0x1a0 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294892]  ?
refcount_dec_and_lock+0x18/0x70
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294899]  ?
auth_domain_put+0x10/0x60 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294907]  ?
svc_authorise+0x34/0x40 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294915]  ?
svc_process_common.isra.8+0x10b/0x6a0 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294923]
svc_tcp_sendto+0x29/0x40 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294930] svc_send+0x94/0x190
[sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.294990]  ?
nfsd_destroy+0x50/0x50 [nfsd]
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.295049]  nfsd+0xdd/0x140 [nfsd]
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.295054] kthread+0x10e/0x130
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.295058]  ?
kthread_park+0x80/0x80
Apr 20 05:48:10 wraith.dragoninc.ca  [  968.295062] ret_from_fork+0x22/0x40


Sometimes during bisection I saw this error instead:
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972318] BUG: unable to
handle kernel paging request at ffff888214c15000
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972336] #PF error: [PROT]
[WRITE]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972339] PGD 2201067 P4D
2201067 PUD 2204067 PMD 2155bd063 PTE 8000000214c15161
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972344] Oops: 0003 [#1] SMP
NOPTI
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972349] CPU: 1 PID: 5456
Comm: nfsd Not tainted 5.0.0-11058-ga840b56ba385 #79
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972352] Hardware name:
Gigabyte Technology Co., Ltd. GA-MA78LMT-S2/GA-MA78LMT-S2, BIOS F14
08/16/2011
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972360] RIP:
0010:__memmove+0x57/0x1a0
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972363] Code: 00 72 05 40 38
fe 74 3b 48 83 ea 20 48 83 ea 20 4c 8b 1e 4c 8b 56 08 4c 8b 4e 10 4c 8b
46 18 48 8d 76 20 4c 89 1f 4c 89 57 08 <4c> 89 4f 10 4c 89 47 18 48 8d
7f 20 73 d4 48 83 c2 20 e9 a2 00 00
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972370] RSP:
0018:ffffc90000437d48 EFLAGS: 00010286
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972373] RAX:
ffff88821498400c RBX: ffffffffdfc7fff4 RCX: ffff888214984024
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972377] RDX:
ffffffffdf9eefd4 RSI: ffff8882121b8000 RDI: ffff888214c14fec
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972379] RBP:
000000002038002c R08: 0000000000000000 R09: 0000000000000000
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972382] R10:
0000000000000000 R11: 0000000000000000 R12: 0000000000000008
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972385] R13:
ffff888214e692d4 R14: ffff888211f27000 R15: 0000000000000007
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972388] FS:
0000000000000000(0000) GS:ffff888217a40000(0000) knlGS:0000000000000000
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972391] CS:  0010 DS: 0000
ES: 0000 CR0: 0000000080050033
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972394] CR2:
ffff888214c15000 CR3: 0000000210996000 CR4: 00000000000006e0
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972397] Call Trace:
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972413]
encode_entry.isra.8+0x1bc/0x310 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972422]
nfsd_readdir+0x184/0x230 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972430]  ?
nfs3svc_encode_readdirres+0xc0/0xc0 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972437]  ?
nfsd_finish_read+0x190/0x190 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972445]
nfsd3_proc_readdir+0x84/0x120 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972452]
nfsd_dispatch+0xc1/0x1b0 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972465]
svc_process_common.isra.8+0x355/0x6a0 [sunrpc]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972474]  ?
svc_recv+0x2ac/0x820 [sunrpc]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972480]  ?
nfsd_destroy+0x50/0x50 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972488]
svc_process+0xd2/0x110 [sunrpc]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972494]  nfsd+0xdd/0x140 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972499] kthread+0x10e/0x130
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972502]  ?
kthread_park+0x80/0x80
Apr 20 21:45:56 wraith.dragoninc.ca  [  365.972506] ret_from_fork+0x22/0x40



2019-04-23 00:22:01

by NeilBrown

[permalink] [raw]
Subject: Re: nfsd bug after f875a792abe9 (nfsd: allow nfsv3 readdir request to be larger.)

On Sat, Apr 20 2019, Doug Nazar wrote:

> After recently upgrading the kernel on one of my servers I was getting
> the below crashes. Tracked it down to an NFSv3 root client, bisection
> points to f875a792abe9. A revert of that on top of
> 5.1.0-rc5-00289-gee2725f711fa works fine.

hi Doug,
thanks for the report and for bisecting!

I think this bug has been fixed, but the fix hasn't landed upstream yet.
If you
git pull git://git.linux-nfs.org/~bfields/linux.git nfs-next

you should get

Commit 3c86794ac0e6 ("nfsd/nfsd3_proc_readdir: fix buffer count and page pointers")

which should fix this.


Bruce: I don't see that patch in your "for-5.1" branch, but you said
"Queuing up for 5.1."
Did something go awry?

Thanks,
NeilBrown


Attachments:
signature.asc (832.00 B)

2019-04-23 02:22:41

by Doug Nazar

[permalink] [raw]
Subject: Re: nfsd bug after f875a792abe9 (nfsd: allow nfsv3 readdir request to be larger.)

On 2019-04-22 20:21, NeilBrown wrote:
> I think this bug has been fixed, but the fix hasn't landed upstream yet.
> If you
> git pull git://git.linux-nfs.org/~bfields/linux.git nfs-next

s/nfs-next/nfsd-next/    ;-)

Seems to work fine. I didn't test just that particular patch however,
just the tree.

I did poke around a bit in the git web interface before, but didn't see
anything that jumped out at me.

Thanks,
Doug


2019-04-23 14:38:55

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd bug after f875a792abe9 (nfsd: allow nfsv3 readdir request to be larger.)

On Tue, Apr 23, 2019 at 10:21:53AM +1000, NeilBrown wrote:
> On Sat, Apr 20 2019, Doug Nazar wrote:
>
> > After recently upgrading the kernel on one of my servers I was getting
> > the below crashes. Tracked it down to an NFSv3 root client, bisection
> > points to f875a792abe9. A revert of that on top of
> > 5.1.0-rc5-00289-gee2725f711fa works fine.
>
> hi Doug,
> thanks for the report and for bisecting!
>
> I think this bug has been fixed, but the fix hasn't landed upstream yet.
> If you
> git pull git://git.linux-nfs.org/~bfields/linux.git nfs-next
>
> you should get
>
> Commit 3c86794ac0e6 ("nfsd/nfsd3_proc_readdir: fix buffer count and page pointers")
>
> which should fix this.
>
>
> Bruce: I don't see that patch in your "for-5.1" branch, but you said
> "Queuing up for 5.1."
> Did something go awry?

Sorry, I've got this confusing system where I commit first to a branch
named for-x.y-incoming, then push it to for-x.y after I've tested it.
Except sometimes I forget to push to the for-x.y branch until the last
minute.

The -incoming branches should always be included in nfsd-next.

Anyway, I'll send a pull request now, it's overdue....

--b.