After recently upgrading the kernel on one of my servers I was getting
the below crashes. Tracked it down to an NFSv3 root client, bisection
points to f875a792abe9. A revert of that on top of
5.1.0-rc5-00289-gee2725f711fa works fine.
Relevant exports:
/storage2/cubox/root
192.168.21.0/24(sec=sys,rw,no_root_squash,insecure,nohide,no_subtree_check,crossmnt,async)
/
*(sec=krb5,rw,fsid=0,no_root_squash,insecure,no_subtree_check,crossmnt,async)
/storage2
*(sec=krb5,rw,no_root_squash,insecure,nohide,no_subtree_check,crossmnt,async)
Client has the following mounts to this server (running a modified 4.9.124):
192.168.21.90:/storage2/cubox/root on / type nfs
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.21.90,mountvers=3,mountproto=tcp,local_lock=all,addr=192.168.21.90)
192.168.21.90:/storage2/cubox/root/var/tmp on /var/tmp type nfs4
(rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=192.168.21.242,local_lock=none,addr=192.168.21.90)
Thanks,
Doug
Initial BUG with HEAD when I started testing.
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294654] BUG: unable to
handle kernel NULL pointer dereference at 0000000000000008
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294772] #PF error: [normal
kernel read fault]
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294776] PGD 0 P4D 0
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294780] Oops: 0000 [#1] SMP
NOPTI
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294784] CPU: 1 PID: 5459
Comm: nfsd Not tainted 5.1.0-rc5-00289-gee2725f711fa #67
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294787] Hardware name:
Gigabyte Technology Co., Ltd. GA-MA78LMT-S2/GA-MA78LMT-S2, BIOS F14
08/16/2011
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294794] RIP:
0010:do_tcp_sendpages+0x308/0x5d0
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294799] Code: ff 48 63 e8 48
8b 5c 24 40 65 48 33 1c 25 28 00 00 00 48 89 e8 0f 85 73 02 00 00 48 83
c4 48 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <49> 8b 57 08 48 8d 42 ff 83 e2
01 49 0f 44 c7 f0 ff 40 34 8b 91 b8
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294804] RSP:
0018:ffffc90000437c90 EFLAGS: 00010202
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294808] RAX:
0000000000000001 RBX: 0000000000028000 RCX: ffff888210505600
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294811] RDX:
ffffffff8154b1a0 RSI: 0000000000001000 RDI: ffff888210316e80
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294814] RBP:
0000000000000000 R08: 0000000000001000 R09: 0000000000000003
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294817] R10:
0000000000000002 R11: 0000000000000005 R12: 0000000000001000
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294820] R13:
00000000000005a8 R14: ffff888210316e80 R15: 0000000000000000
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294823] FS:
0000000000000000(0000) GS:ffff888217a40000(0000) knlGS:0000000000000000
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294826] CS: 0010 DS: 0000
ES: 0000 CR0: 0000000080050033
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294829] CR2:
0000000000000008 CR3: 00000002150aa000 CR4: 00000000000006e0
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294832] Call Trace:
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294837]
tcp_sendpage_locked+0x42/0x60
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294841] tcp_sendpage+0x38/0x50
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294844] inet_sendpage+0x3a/0xe0
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294856] ?
encode_post_op_attr.isra.4+0xf4/0x100 [nfsd]
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294865]
kernel_sendpage+0x15/0x30
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294878]
svc_send_common+0xed/0x150 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294887]
svc_sendto+0xf6/0x1a0 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294892] ?
refcount_dec_and_lock+0x18/0x70
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294899] ?
auth_domain_put+0x10/0x60 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294907] ?
svc_authorise+0x34/0x40 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294915] ?
svc_process_common.isra.8+0x10b/0x6a0 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294923]
svc_tcp_sendto+0x29/0x40 [sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294930] svc_send+0x94/0x190
[sunrpc]
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.294990] ?
nfsd_destroy+0x50/0x50 [nfsd]
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.295049] nfsd+0xdd/0x140 [nfsd]
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.295054] kthread+0x10e/0x130
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.295058] ?
kthread_park+0x80/0x80
Apr 20 05:48:10 wraith.dragoninc.ca [ 968.295062] ret_from_fork+0x22/0x40
Sometimes during bisection I saw this error instead:
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972318] BUG: unable to
handle kernel paging request at ffff888214c15000
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972336] #PF error: [PROT]
[WRITE]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972339] PGD 2201067 P4D
2201067 PUD 2204067 PMD 2155bd063 PTE 8000000214c15161
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972344] Oops: 0003 [#1] SMP
NOPTI
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972349] CPU: 1 PID: 5456
Comm: nfsd Not tainted 5.0.0-11058-ga840b56ba385 #79
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972352] Hardware name:
Gigabyte Technology Co., Ltd. GA-MA78LMT-S2/GA-MA78LMT-S2, BIOS F14
08/16/2011
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972360] RIP:
0010:__memmove+0x57/0x1a0
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972363] Code: 00 72 05 40 38
fe 74 3b 48 83 ea 20 48 83 ea 20 4c 8b 1e 4c 8b 56 08 4c 8b 4e 10 4c 8b
46 18 48 8d 76 20 4c 89 1f 4c 89 57 08 <4c> 89 4f 10 4c 89 47 18 48 8d
7f 20 73 d4 48 83 c2 20 e9 a2 00 00
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972370] RSP:
0018:ffffc90000437d48 EFLAGS: 00010286
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972373] RAX:
ffff88821498400c RBX: ffffffffdfc7fff4 RCX: ffff888214984024
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972377] RDX:
ffffffffdf9eefd4 RSI: ffff8882121b8000 RDI: ffff888214c14fec
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972379] RBP:
000000002038002c R08: 0000000000000000 R09: 0000000000000000
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972382] R10:
0000000000000000 R11: 0000000000000000 R12: 0000000000000008
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972385] R13:
ffff888214e692d4 R14: ffff888211f27000 R15: 0000000000000007
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972388] FS:
0000000000000000(0000) GS:ffff888217a40000(0000) knlGS:0000000000000000
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972391] CS: 0010 DS: 0000
ES: 0000 CR0: 0000000080050033
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972394] CR2:
ffff888214c15000 CR3: 0000000210996000 CR4: 00000000000006e0
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972397] Call Trace:
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972413]
encode_entry.isra.8+0x1bc/0x310 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972422]
nfsd_readdir+0x184/0x230 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972430] ?
nfs3svc_encode_readdirres+0xc0/0xc0 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972437] ?
nfsd_finish_read+0x190/0x190 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972445]
nfsd3_proc_readdir+0x84/0x120 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972452]
nfsd_dispatch+0xc1/0x1b0 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972465]
svc_process_common.isra.8+0x355/0x6a0 [sunrpc]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972474] ?
svc_recv+0x2ac/0x820 [sunrpc]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972480] ?
nfsd_destroy+0x50/0x50 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972488]
svc_process+0xd2/0x110 [sunrpc]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972494] nfsd+0xdd/0x140 [nfsd]
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972499] kthread+0x10e/0x130
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972502] ?
kthread_park+0x80/0x80
Apr 20 21:45:56 wraith.dragoninc.ca [ 365.972506] ret_from_fork+0x22/0x40
On Sat, Apr 20 2019, Doug Nazar wrote:
> After recently upgrading the kernel on one of my servers I was getting
> the below crashes. Tracked it down to an NFSv3 root client, bisection
> points to f875a792abe9. A revert of that on top of
> 5.1.0-rc5-00289-gee2725f711fa works fine.
hi Doug,
thanks for the report and for bisecting!
I think this bug has been fixed, but the fix hasn't landed upstream yet.
If you
git pull git://git.linux-nfs.org/~bfields/linux.git nfs-next
you should get
Commit 3c86794ac0e6 ("nfsd/nfsd3_proc_readdir: fix buffer count and page pointers")
which should fix this.
Bruce: I don't see that patch in your "for-5.1" branch, but you said
"Queuing up for 5.1."
Did something go awry?
Thanks,
NeilBrown
On 2019-04-22 20:21, NeilBrown wrote:
> I think this bug has been fixed, but the fix hasn't landed upstream yet.
> If you
> git pull git://git.linux-nfs.org/~bfields/linux.git nfs-next
s/nfs-next/nfsd-next/ ;-)
Seems to work fine. I didn't test just that particular patch however,
just the tree.
I did poke around a bit in the git web interface before, but didn't see
anything that jumped out at me.
Thanks,
Doug
On Tue, Apr 23, 2019 at 10:21:53AM +1000, NeilBrown wrote:
> On Sat, Apr 20 2019, Doug Nazar wrote:
>
> > After recently upgrading the kernel on one of my servers I was getting
> > the below crashes. Tracked it down to an NFSv3 root client, bisection
> > points to f875a792abe9. A revert of that on top of
> > 5.1.0-rc5-00289-gee2725f711fa works fine.
>
> hi Doug,
> thanks for the report and for bisecting!
>
> I think this bug has been fixed, but the fix hasn't landed upstream yet.
> If you
> git pull git://git.linux-nfs.org/~bfields/linux.git nfs-next
>
> you should get
>
> Commit 3c86794ac0e6 ("nfsd/nfsd3_proc_readdir: fix buffer count and page pointers")
>
> which should fix this.
>
>
> Bruce: I don't see that patch in your "for-5.1" branch, but you said
> "Queuing up for 5.1."
> Did something go awry?
Sorry, I've got this confusing system where I commit first to a branch
named for-x.y-incoming, then push it to for-x.y after I've tested it.
Except sometimes I forget to push to the for-x.y branch until the last
minute.
The -incoming branches should always be included in nfsd-next.
Anyway, I'll send a pull request now, it's overdue....
--b.