Hi Chuck,
While doing some testing today with pynfs on a branch based on your
nfsd-next branch. I'm not sure which test triggers it, but it's one of
the v4.0 tests. It only takes a few mins before it crashes:
Jul 19 19:22:43 kdevops-nfsd kernel: BUG: unable to handle page fault for address: ffffd8442d049108
Jul 19 19:22:43 kdevops-nfsd kernel: #PF: supervisor read access in kernel mode
Jul 19 19:22:43 kdevops-nfsd kernel: #PF: error_code(0x0000) - not-present page
Jul 19 19:22:43 kdevops-nfsd kernel: PGD 0 P4D 0
Jul 19 19:22:43 kdevops-nfsd kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Jul 19 19:22:43 kdevops-nfsd kernel: CPU: 0 PID: 743 Comm: nfsd Not tainted 6.5.0-rc2+ #19
Jul 19 19:22:43 kdevops-nfsd kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0010:kfree+0x4b/0x110
Jul 19 19:22:43 kdevops-nfsd kernel: Code: 80 48 01 d8 0f 82 d8 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 9f d7 fb 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 7d d7 fb 00 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 9f 00 00 >
Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0018:ffffb0858661bdc0 EFLAGS: 00010286
Jul 19 19:22:43 kdevops-nfsd kernel: RAX: ffffd8442d049100 RBX: 0000000081244052 RCX: ffff8a665e003008
Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000759a40000000 RSI: ffffffffc0b13a45 RDI: 0000000081244052
Jul 19 19:22:43 kdevops-nfsd kernel: RBP: ffffb0858661be40 R08: ffff8a665e003008 R09: ffffffff8ebdc4ec
Jul 19 19:22:43 kdevops-nfsd kernel: R10: 000000000000000e R11: 000000000000001b R12: ffff8a6656f20150
Jul 19 19:22:43 kdevops-nfsd kernel: R13: ffff8a665e003008 R14: ffff8a6656f20100 R15: ffff8a6656f20980
Jul 19 19:22:43 kdevops-nfsd kernel: FS: 0000000000000000(0000) GS:ffff8a67bfc00000(0000) knlGS:0000000000000000
Jul 19 19:22:43 kdevops-nfsd kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffd8442d049108 CR3: 0000000277e1a001 CR4: 0000000000060ef0
Jul 19 19:22:43 kdevops-nfsd kernel: Call Trace:
Jul 19 19:22:43 kdevops-nfsd kernel: <TASK>
Jul 19 19:22:43 kdevops-nfsd kernel: ? __die+0x1f/0x70
Jul 19 19:22:43 kdevops-nfsd kernel: ? page_fault_oops+0x159/0x450
Jul 19 19:22:43 kdevops-nfsd kernel: ? fixup_exception+0x22/0x310
Jul 19 19:22:43 kdevops-nfsd kernel: ? exc_page_fault+0x157/0x180
Jul 19 19:22:43 kdevops-nfsd kernel: ? asm_exc_page_fault+0x22/0x30
Jul 19 19:22:43 kdevops-nfsd kernel: ? nfsd_cache_lookup+0x3c5/0x770 [nfsd]
Jul 19 19:22:43 kdevops-nfsd kernel: ? kfree+0x4b/0x110
Jul 19 19:22:43 kdevops-nfsd kernel: nfsd_cache_lookup+0x3c5/0x770 [nfsd]
Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd+0x10/0x10 [nfsd]
Jul 19 19:22:43 kdevops-nfsd kernel: nfsd_dispatch+0x62/0x1b0 [nfsd]
Jul 19 19:22:43 kdevops-nfsd kernel: svc_process_common+0x3cb/0x6c0 [sunrpc]
Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd+0x10/0x10 [nfsd]
Jul 19 19:22:43 kdevops-nfsd kernel: svc_process+0x12d/0x170 [sunrpc]
Jul 19 19:22:43 kdevops-nfsd kernel: nfsd+0xd5/0x1a0 [nfsd]
Jul 19 19:22:43 kdevops-nfsd kernel: kthread+0xf3/0x120
Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_kthread+0x10/0x10
Jul 19 19:22:43 kdevops-nfsd kernel: ret_from_fork+0x30/0x50
Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_kthread+0x10/0x10
Jul 19 19:22:43 kdevops-nfsd kernel: ret_from_fork_asm+0x1b/0x30
Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0000:0x0
Jul 19 19:22:43 kdevops-nfsd kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
Jul 19 19:22:43 kdevops-nfsd kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jul 19 19:22:43 kdevops-nfsd kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Jul 19 19:22:43 kdevops-nfsd kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jul 19 19:22:43 kdevops-nfsd kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jul 19 19:22:43 kdevops-nfsd kernel: </TASK>
Jul 19 19:22:43 kdevops-nfsd kernel: Modules linked in: 9p netfs nls_iso8859_1 nls_cp437 vfat fat kvm_intel joydev kvm cirrus psmouse drm_shmem_helper irqbypass pcspkr virtio_net net_failover failover virtio_balloon >
Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffd8442d049108
Jul 19 19:22:43 kdevops-nfsd kernel: ---[ end trace 0000000000000000 ]---
Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0010:kfree+0x4b/0x110
Jul 19 19:22:43 kdevops-nfsd kernel: Code: 80 48 01 d8 0f 82 d8 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 9f d7 fb 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 7d d7 fb 00 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 9f 00 00 >
Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0018:ffffb0858661bdc0 EFLAGS: 00010286
Jul 19 19:22:43 kdevops-nfsd kernel: RAX: ffffd8442d049100 RBX: 0000000081244052 RCX: ffff8a665e003008
Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000759a40000000 RSI: ffffffffc0b13a45 RDI: 0000000081244052
Jul 19 19:22:43 kdevops-nfsd kernel: RBP: ffffb0858661be40 R08: ffff8a665e003008 R09: ffffffff8ebdc4ec
Jul 19 19:22:43 kdevops-nfsd kernel: R10: 000000000000000e R11: 000000000000001b R12: ffff8a6656f20150
Jul 19 19:22:43 kdevops-nfsd kernel: R13: ffff8a665e003008 R14: ffff8a6656f20100 R15: ffff8a6656f20980
Jul 19 19:22:43 kdevops-nfsd kernel: FS: 0000000000000000(0000) GS:ffff8a67bfc00000(0000) knlGS:0000000000000000
Jul 19 19:22:43 kdevops-nfsd kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffffffffffffd6 CR3: 0000000277e1a001 CR4: 0000000000060ef0
Jul 19 19:22:43 kdevops-nfsd kernel: note: nfsd[743] exited with irqs disabled
Jul 19 19:22:43 kdevops-nfsd kernel: note: nfsd[743] exited with preempt_count 1
faddr2line says:
$ ./scripts/faddr2line --list fs/nfsd/nfsd.ko nfsd_cache_lookup+0x3c5/0x770
nfsd_cacherep_free at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:116
111 }
112
113 static void nfsd_cacherep_free(struct nfsd_cacherep *rp)
114 {
115 kfree(rp->c_replvec.iov_base);
>116< kmem_cache_free(drc_slab, rp);
117 }
118
119 static unsigned long
120 nfsd_cacherep_dispose(struct list_head *dispose)
121 {
(inlined by) nfsd_reply_cache_free_locked at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:153
148 static void
149 nfsd_reply_cache_free_locked(struct nfsd_drc_bucket *b, struct nfsd_cacherep *rp,
150 struct nfsd_net *nn)
151 {
152 nfsd_cacherep_unlink_locked(nn, b, rp);
>153< nfsd_cacherep_free(rp);
154 }
155
156 static void
157 nfsd_reply_cache_free(struct nfsd_drc_bucket *b, struct nfsd_cacherep *rp,
158 struct nfsd_net *nn)
(inlined by) nfsd_cache_lookup at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:527
522 nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp));
523 goto out;
524
525 found_entry:
526 /* We found a matching entry which is either in progress or done. */
>527< nfsd_reply_cache_free_locked(NULL, rp, nn);
528 nfsd_stats_rc_hits_inc();
529 rtn = RC_DROPIT;
530 rp = found;
531
532 /* Request being processed */
...and a bisect landed here:
767b1d5badd6eb418e3f91f0cd8fa6d2894ff43a is the first bad commit
commit 767b1d5badd6eb418e3f91f0cd8fa6d2894ff43a
Author: Chuck Lever <[email protected]>
Date: Sun Jul 9 11:45:16 2023 -0400
NFSD: Refactor nfsd_reply_cache_free_locked()
To reduce contention on the bucket locks, we must avoid calling
kfree() while each bucket lock is held.
Start by refactoring nfsd_reply_cache_free_locked() into a helper
that removes an entry from the bucket (and must therefore run under
the lock) and a second helper that frees the entry (which does not
need to hold the lock).
For readability, rename the helpers nfsd_cacherep_<verb>.
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
fs/nfsd/nfscache.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
--
Jeff Layton <[email protected]>
> On Jul 19, 2023, at 5:48 PM, Jeff Layton <[email protected]> wrote:
>
> Hi Chuck,
>
> While doing some testing today with pynfs on a branch based on your
> nfsd-next branch. I'm not sure which test triggers it, but it's one of
> the v4.0 tests.
I've just started running pynfs on nfsd-next, haven't seen any
crashes so far. I'll take a dive in tomorrow.
> It only takes a few mins before it crashes:
>
> Jul 19 19:22:43 kdevops-nfsd kernel: BUG: unable to handle page fault for address: ffffd8442d049108
> Jul 19 19:22:43 kdevops-nfsd kernel: #PF: supervisor read access in kernel mode
> Jul 19 19:22:43 kdevops-nfsd kernel: #PF: error_code(0x0000) - not-present page
> Jul 19 19:22:43 kdevops-nfsd kernel: PGD 0 P4D 0
> Jul 19 19:22:43 kdevops-nfsd kernel: Oops: 0000 [#1] PREEMPT SMP PTI
> Jul 19 19:22:43 kdevops-nfsd kernel: CPU: 0 PID: 743 Comm: nfsd Not tainted 6.5.0-rc2+ #19
> Jul 19 19:22:43 kdevops-nfsd kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
> Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0010:kfree+0x4b/0x110
> Jul 19 19:22:43 kdevops-nfsd kernel: Code: 80 48 01 d8 0f 82 d8 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 9f d7 fb 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 7d d7 fb 00 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 9f 00 00 >
> Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0018:ffffb0858661bdc0 EFLAGS: 00010286
> Jul 19 19:22:43 kdevops-nfsd kernel: RAX: ffffd8442d049100 RBX: 0000000081244052 RCX: ffff8a665e003008
> Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000759a40000000 RSI: ffffffffc0b13a45 RDI: 0000000081244052
> Jul 19 19:22:43 kdevops-nfsd kernel: RBP: ffffb0858661be40 R08: ffff8a665e003008 R09: ffffffff8ebdc4ec
> Jul 19 19:22:43 kdevops-nfsd kernel: R10: 000000000000000e R11: 000000000000001b R12: ffff8a6656f20150
> Jul 19 19:22:43 kdevops-nfsd kernel: R13: ffff8a665e003008 R14: ffff8a6656f20100 R15: ffff8a6656f20980
> Jul 19 19:22:43 kdevops-nfsd kernel: FS: 0000000000000000(0000) GS:ffff8a67bfc00000(0000) knlGS:0000000000000000
> Jul 19 19:22:43 kdevops-nfsd kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffd8442d049108 CR3: 0000000277e1a001 CR4: 0000000000060ef0
> Jul 19 19:22:43 kdevops-nfsd kernel: Call Trace:
> Jul 19 19:22:43 kdevops-nfsd kernel: <TASK>
> Jul 19 19:22:43 kdevops-nfsd kernel: ? __die+0x1f/0x70
> Jul 19 19:22:43 kdevops-nfsd kernel: ? page_fault_oops+0x159/0x450
> Jul 19 19:22:43 kdevops-nfsd kernel: ? fixup_exception+0x22/0x310
> Jul 19 19:22:43 kdevops-nfsd kernel: ? exc_page_fault+0x157/0x180
> Jul 19 19:22:43 kdevops-nfsd kernel: ? asm_exc_page_fault+0x22/0x30
> Jul 19 19:22:43 kdevops-nfsd kernel: ? nfsd_cache_lookup+0x3c5/0x770 [nfsd]
> Jul 19 19:22:43 kdevops-nfsd kernel: ? kfree+0x4b/0x110
> Jul 19 19:22:43 kdevops-nfsd kernel: nfsd_cache_lookup+0x3c5/0x770 [nfsd]
> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd+0x10/0x10 [nfsd]
> Jul 19 19:22:43 kdevops-nfsd kernel: nfsd_dispatch+0x62/0x1b0 [nfsd]
> Jul 19 19:22:43 kdevops-nfsd kernel: svc_process_common+0x3cb/0x6c0 [sunrpc]
> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd+0x10/0x10 [nfsd]
> Jul 19 19:22:43 kdevops-nfsd kernel: svc_process+0x12d/0x170 [sunrpc]
> Jul 19 19:22:43 kdevops-nfsd kernel: nfsd+0xd5/0x1a0 [nfsd]
> Jul 19 19:22:43 kdevops-nfsd kernel: kthread+0xf3/0x120
> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_kthread+0x10/0x10
> Jul 19 19:22:43 kdevops-nfsd kernel: ret_from_fork+0x30/0x50
> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_kthread+0x10/0x10
> Jul 19 19:22:43 kdevops-nfsd kernel: ret_from_fork_asm+0x1b/0x30
> Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0000:0x0
> Jul 19 19:22:43 kdevops-nfsd kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
> Jul 19 19:22:43 kdevops-nfsd kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> Jul 19 19:22:43 kdevops-nfsd kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> Jul 19 19:22:43 kdevops-nfsd kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> Jul 19 19:22:43 kdevops-nfsd kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> Jul 19 19:22:43 kdevops-nfsd kernel: </TASK>
> Jul 19 19:22:43 kdevops-nfsd kernel: Modules linked in: 9p netfs nls_iso8859_1 nls_cp437 vfat fat kvm_intel joydev kvm cirrus psmouse drm_shmem_helper irqbypass pcspkr virtio_net net_failover failover virtio_balloon >
> Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffd8442d049108
> Jul 19 19:22:43 kdevops-nfsd kernel: ---[ end trace 0000000000000000 ]---
> Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0010:kfree+0x4b/0x110
> Jul 19 19:22:43 kdevops-nfsd kernel: Code: 80 48 01 d8 0f 82 d8 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 9f d7 fb 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 7d d7 fb 00 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 9f 00 00 >
> Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0018:ffffb0858661bdc0 EFLAGS: 00010286
> Jul 19 19:22:43 kdevops-nfsd kernel: RAX: ffffd8442d049100 RBX: 0000000081244052 RCX: ffff8a665e003008
> Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000759a40000000 RSI: ffffffffc0b13a45 RDI: 0000000081244052
> Jul 19 19:22:43 kdevops-nfsd kernel: RBP: ffffb0858661be40 R08: ffff8a665e003008 R09: ffffffff8ebdc4ec
> Jul 19 19:22:43 kdevops-nfsd kernel: R10: 000000000000000e R11: 000000000000001b R12: ffff8a6656f20150
> Jul 19 19:22:43 kdevops-nfsd kernel: R13: ffff8a665e003008 R14: ffff8a6656f20100 R15: ffff8a6656f20980
> Jul 19 19:22:43 kdevops-nfsd kernel: FS: 0000000000000000(0000) GS:ffff8a67bfc00000(0000) knlGS:0000000000000000
> Jul 19 19:22:43 kdevops-nfsd kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffffffffffffd6 CR3: 0000000277e1a001 CR4: 0000000000060ef0
> Jul 19 19:22:43 kdevops-nfsd kernel: note: nfsd[743] exited with irqs disabled
> Jul 19 19:22:43 kdevops-nfsd kernel: note: nfsd[743] exited with preempt_count 1
>
> faddr2line says:
>
> $ ./scripts/faddr2line --list fs/nfsd/nfsd.ko nfsd_cache_lookup+0x3c5/0x770
> nfsd_cacherep_free at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:116
> 111 }
> 112
> 113 static void nfsd_cacherep_free(struct nfsd_cacherep *rp)
> 114 {
> 115 kfree(rp->c_replvec.iov_base);
>> 116< kmem_cache_free(drc_slab, rp);
> 117 }
> 118
> 119 static unsigned long
> 120 nfsd_cacherep_dispose(struct list_head *dispose)
> 121 {
>
> (inlined by) nfsd_reply_cache_free_locked at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:153
> 148 static void
> 149 nfsd_reply_cache_free_locked(struct nfsd_drc_bucket *b, struct nfsd_cacherep *rp,
> 150 struct nfsd_net *nn)
> 151 {
> 152 nfsd_cacherep_unlink_locked(nn, b, rp);
>> 153< nfsd_cacherep_free(rp);
> 154 }
> 155
> 156 static void
> 157 nfsd_reply_cache_free(struct nfsd_drc_bucket *b, struct nfsd_cacherep *rp,
> 158 struct nfsd_net *nn)
>
> (inlined by) nfsd_cache_lookup at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:527
> 522 nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp));
> 523 goto out;
> 524
> 525 found_entry:
> 526 /* We found a matching entry which is either in progress or done. */
>> 527< nfsd_reply_cache_free_locked(NULL, rp, nn);
> 528 nfsd_stats_rc_hits_inc();
> 529 rtn = RC_DROPIT;
> 530 rp = found;
> 531
> 532 /* Request being processed */
>
>
> ...and a bisect landed here:
>
>
> 767b1d5badd6eb418e3f91f0cd8fa6d2894ff43a is the first bad commit
> commit 767b1d5badd6eb418e3f91f0cd8fa6d2894ff43a
> Author: Chuck Lever <[email protected]>
> Date: Sun Jul 9 11:45:16 2023 -0400
>
> NFSD: Refactor nfsd_reply_cache_free_locked()
>
> To reduce contention on the bucket locks, we must avoid calling
> kfree() while each bucket lock is held.
>
> Start by refactoring nfsd_reply_cache_free_locked() into a helper
> that removes an entry from the bucket (and must therefore run under
> the lock) and a second helper that frees the entry (which does not
> need to hold the lock).
>
> For readability, rename the helpers nfsd_cacherep_<verb>.
>
> Reviewed-by: Jeff Layton <[email protected]>
> Signed-off-by: Chuck Lever <[email protected]>
>
> fs/nfsd/nfscache.c | 26 +++++++++++++++++++-------
> 1 file changed, 19 insertions(+), 7 deletions(-)
>
> --
> Jeff Layton <[email protected]>
--
Chuck Lever
> On Jul 19, 2023, at 7:46 PM, Chuck Lever III <[email protected]> wrote:
>
>
>
>> On Jul 19, 2023, at 5:48 PM, Jeff Layton <[email protected]> wrote:
>>
>> Hi Chuck,
>>
>> While doing some testing today with pynfs on a branch based on your
>> nfsd-next branch. I'm not sure which test triggers it, but it's one of
>> the v4.0 tests.
>
> I've just started running pynfs on nfsd-next, haven't seen any
> crashes so far. I'll take a dive in tomorrow.
I've reproduced this, and kbot also appears to have hit it.
>> It only takes a few mins before it crashes:
>>
>> Jul 19 19:22:43 kdevops-nfsd kernel: BUG: unable to handle page fault for address: ffffd8442d049108
>> Jul 19 19:22:43 kdevops-nfsd kernel: #PF: supervisor read access in kernel mode
>> Jul 19 19:22:43 kdevops-nfsd kernel: #PF: error_code(0x0000) - not-present page
>> Jul 19 19:22:43 kdevops-nfsd kernel: PGD 0 P4D 0
>> Jul 19 19:22:43 kdevops-nfsd kernel: Oops: 0000 [#1] PREEMPT SMP PTI
>> Jul 19 19:22:43 kdevops-nfsd kernel: CPU: 0 PID: 743 Comm: nfsd Not tainted 6.5.0-rc2+ #19
>> Jul 19 19:22:43 kdevops-nfsd kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
>> Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0010:kfree+0x4b/0x110
>> Jul 19 19:22:43 kdevops-nfsd kernel: Code: 80 48 01 d8 0f 82 d8 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 9f d7 fb 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 7d d7 fb 00 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 9f 00 00 >
>> Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0018:ffffb0858661bdc0 EFLAGS: 00010286
>> Jul 19 19:22:43 kdevops-nfsd kernel: RAX: ffffd8442d049100 RBX: 0000000081244052 RCX: ffff8a665e003008
>> Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000759a40000000 RSI: ffffffffc0b13a45 RDI: 0000000081244052
>> Jul 19 19:22:43 kdevops-nfsd kernel: RBP: ffffb0858661be40 R08: ffff8a665e003008 R09: ffffffff8ebdc4ec
>> Jul 19 19:22:43 kdevops-nfsd kernel: R10: 000000000000000e R11: 000000000000001b R12: ffff8a6656f20150
>> Jul 19 19:22:43 kdevops-nfsd kernel: R13: ffff8a665e003008 R14: ffff8a6656f20100 R15: ffff8a6656f20980
>> Jul 19 19:22:43 kdevops-nfsd kernel: FS: 0000000000000000(0000) GS:ffff8a67bfc00000(0000) knlGS:0000000000000000
>> Jul 19 19:22:43 kdevops-nfsd kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffd8442d049108 CR3: 0000000277e1a001 CR4: 0000000000060ef0
>> Jul 19 19:22:43 kdevops-nfsd kernel: Call Trace:
>> Jul 19 19:22:43 kdevops-nfsd kernel: <TASK>
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __die+0x1f/0x70
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? page_fault_oops+0x159/0x450
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? fixup_exception+0x22/0x310
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? exc_page_fault+0x157/0x180
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? asm_exc_page_fault+0x22/0x30
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? nfsd_cache_lookup+0x3c5/0x770 [nfsd]
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? kfree+0x4b/0x110
>> Jul 19 19:22:43 kdevops-nfsd kernel: nfsd_cache_lookup+0x3c5/0x770 [nfsd]
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd+0x10/0x10 [nfsd]
>> Jul 19 19:22:43 kdevops-nfsd kernel: nfsd_dispatch+0x62/0x1b0 [nfsd]
>> Jul 19 19:22:43 kdevops-nfsd kernel: svc_process_common+0x3cb/0x6c0 [sunrpc]
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd+0x10/0x10 [nfsd]
>> Jul 19 19:22:43 kdevops-nfsd kernel: svc_process+0x12d/0x170 [sunrpc]
>> Jul 19 19:22:43 kdevops-nfsd kernel: nfsd+0xd5/0x1a0 [nfsd]
>> Jul 19 19:22:43 kdevops-nfsd kernel: kthread+0xf3/0x120
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_kthread+0x10/0x10
>> Jul 19 19:22:43 kdevops-nfsd kernel: ret_from_fork+0x30/0x50
>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_kthread+0x10/0x10
>> Jul 19 19:22:43 kdevops-nfsd kernel: ret_from_fork_asm+0x1b/0x30
>> Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0000:0x0
>> Jul 19 19:22:43 kdevops-nfsd kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
>> Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
>> Jul 19 19:22:43 kdevops-nfsd kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>> Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> Jul 19 19:22:43 kdevops-nfsd kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>> Jul 19 19:22:43 kdevops-nfsd kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>> Jul 19 19:22:43 kdevops-nfsd kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>> Jul 19 19:22:43 kdevops-nfsd kernel: </TASK>
>> Jul 19 19:22:43 kdevops-nfsd kernel: Modules linked in: 9p netfs nls_iso8859_1 nls_cp437 vfat fat kvm_intel joydev kvm cirrus psmouse drm_shmem_helper irqbypass pcspkr virtio_net net_failover failover virtio_balloon >
>> Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffd8442d049108
>> Jul 19 19:22:43 kdevops-nfsd kernel: ---[ end trace 0000000000000000 ]---
>> Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0010:kfree+0x4b/0x110
>> Jul 19 19:22:43 kdevops-nfsd kernel: Code: 80 48 01 d8 0f 82 d8 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 9f d7 fb 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 7d d7 fb 00 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 9f 00 00 >
>> Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0018:ffffb0858661bdc0 EFLAGS: 00010286
>> Jul 19 19:22:43 kdevops-nfsd kernel: RAX: ffffd8442d049100 RBX: 0000000081244052 RCX: ffff8a665e003008
>> Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000759a40000000 RSI: ffffffffc0b13a45 RDI: 0000000081244052
>> Jul 19 19:22:43 kdevops-nfsd kernel: RBP: ffffb0858661be40 R08: ffff8a665e003008 R09: ffffffff8ebdc4ec
>> Jul 19 19:22:43 kdevops-nfsd kernel: R10: 000000000000000e R11: 000000000000001b R12: ffff8a6656f20150
>> Jul 19 19:22:43 kdevops-nfsd kernel: R13: ffff8a665e003008 R14: ffff8a6656f20100 R15: ffff8a6656f20980
>> Jul 19 19:22:43 kdevops-nfsd kernel: FS: 0000000000000000(0000) GS:ffff8a67bfc00000(0000) knlGS:0000000000000000
>> Jul 19 19:22:43 kdevops-nfsd kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffffffffffffd6 CR3: 0000000277e1a001 CR4: 0000000000060ef0
>> Jul 19 19:22:43 kdevops-nfsd kernel: note: nfsd[743] exited with irqs disabled
>> Jul 19 19:22:43 kdevops-nfsd kernel: note: nfsd[743] exited with preempt_count 1
>>
>> faddr2line says:
>>
>> $ ./scripts/faddr2line --list fs/nfsd/nfsd.ko nfsd_cache_lookup+0x3c5/0x770
>> nfsd_cacherep_free at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:116
>> 111 }
>> 112
>> 113 static void nfsd_cacherep_free(struct nfsd_cacherep *rp)
>> 114 {
>> 115 kfree(rp->c_replvec.iov_base);
>>> 116< kmem_cache_free(drc_slab, rp);
>> 117 }
>> 118
>> 119 static unsigned long
>> 120 nfsd_cacherep_dispose(struct list_head *dispose)
>> 121 {
>>
>> (inlined by) nfsd_reply_cache_free_locked at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:153
>> 148 static void
>> 149 nfsd_reply_cache_free_locked(struct nfsd_drc_bucket *b, struct nfsd_cacherep *rp,
>> 150 struct nfsd_net *nn)
>> 151 {
>> 152 nfsd_cacherep_unlink_locked(nn, b, rp);
>>> 153< nfsd_cacherep_free(rp);
>> 154 }
>> 155
>> 156 static void
>> 157 nfsd_reply_cache_free(struct nfsd_drc_bucket *b, struct nfsd_cacherep *rp,
>> 158 struct nfsd_net *nn)
>>
>> (inlined by) nfsd_cache_lookup at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:527
>> 522 nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp));
>> 523 goto out;
>> 524
>> 525 found_entry:
>> 526 /* We found a matching entry which is either in progress or done. */
>>> 527< nfsd_reply_cache_free_locked(NULL, rp, nn);
>> 528 nfsd_stats_rc_hits_inc();
>> 529 rtn = RC_DROPIT;
>> 530 rp = found;
>> 531
>> 532 /* Request being processed */
>>
>>
>> ...and a bisect landed here:
>>
>>
>> 767b1d5badd6eb418e3f91f0cd8fa6d2894ff43a is the first bad commit
>> commit 767b1d5badd6eb418e3f91f0cd8fa6d2894ff43a
>> Author: Chuck Lever <[email protected]>
>> Date: Sun Jul 9 11:45:16 2023 -0400
>>
>> NFSD: Refactor nfsd_reply_cache_free_locked()
>>
>> To reduce contention on the bucket locks, we must avoid calling
>> kfree() while each bucket lock is held.
>>
>> Start by refactoring nfsd_reply_cache_free_locked() into a helper
>> that removes an entry from the bucket (and must therefore run under
>> the lock) and a second helper that frees the entry (which does not
>> need to hold the lock).
>>
>> For readability, rename the helpers nfsd_cacherep_<verb>.
>>
>> Reviewed-by: Jeff Layton <[email protected]>
>> Signed-off-by: Chuck Lever <[email protected]>
>>
>> fs/nfsd/nfscache.c | 26 +++++++++++++++++++-------
>> 1 file changed, 19 insertions(+), 7 deletions(-)
>>
>> --
>> Jeff Layton <[email protected]>
>
> --
> Chuck Lever
--
Chuck Lever
On Wed, 2023-07-19 at 23:46 +0000, Chuck Lever III wrote:
>
> > On Jul 19, 2023, at 5:48 PM, Jeff Layton <[email protected]> wrote:
> >
> > Hi Chuck,
> >
> > While doing some testing today with pynfs on a branch based on your
> > nfsd-next branch. I'm not sure which test triggers it, but it's one of
> > the v4.0 tests.
>
> I've just started running pynfs on nfsd-next, haven't seen any
> crashes so far. I'll take a dive in tomorrow.
>
>
> > It only takes a few mins before it crashes:
> >
> > Jul 19 19:22:43 kdevops-nfsd kernel: BUG: unable to handle page fault for address: ffffd8442d049108
> > Jul 19 19:22:43 kdevops-nfsd kernel: #PF: supervisor read access in kernel mode
> > Jul 19 19:22:43 kdevops-nfsd kernel: #PF: error_code(0x0000) - not-present page
> > Jul 19 19:22:43 kdevops-nfsd kernel: PGD 0 P4D 0
> > Jul 19 19:22:43 kdevops-nfsd kernel: Oops: 0000 [#1] PREEMPT SMP PTI
> > Jul 19 19:22:43 kdevops-nfsd kernel: CPU: 0 PID: 743 Comm: nfsd Not tainted 6.5.0-rc2+ #19
> > Jul 19 19:22:43 kdevops-nfsd kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
> > Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0010:kfree+0x4b/0x110
> > Jul 19 19:22:43 kdevops-nfsd kernel: Code: 80 48 01 d8 0f 82 d8 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 9f d7 fb 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 7d d7 fb 00 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 9f 00 00 >
> > Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0018:ffffb0858661bdc0 EFLAGS: 00010286
> > Jul 19 19:22:43 kdevops-nfsd kernel: RAX: ffffd8442d049100 RBX: 0000000081244052 RCX: ffff8a665e003008
> > Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000759a40000000 RSI: ffffffffc0b13a45 RDI: 0000000081244052
> > Jul 19 19:22:43 kdevops-nfsd kernel: RBP: ffffb0858661be40 R08: ffff8a665e003008 R09: ffffffff8ebdc4ec
> > Jul 19 19:22:43 kdevops-nfsd kernel: R10: 000000000000000e R11: 000000000000001b R12: ffff8a6656f20150
> > Jul 19 19:22:43 kdevops-nfsd kernel: R13: ffff8a665e003008 R14: ffff8a6656f20100 R15: ffff8a6656f20980
> > Jul 19 19:22:43 kdevops-nfsd kernel: FS: 0000000000000000(0000) GS:ffff8a67bfc00000(0000) knlGS:0000000000000000
> > Jul 19 19:22:43 kdevops-nfsd kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffd8442d049108 CR3: 0000000277e1a001 CR4: 0000000000060ef0
> > Jul 19 19:22:43 kdevops-nfsd kernel: Call Trace:
> > Jul 19 19:22:43 kdevops-nfsd kernel: <TASK>
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? __die+0x1f/0x70
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? page_fault_oops+0x159/0x450
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? fixup_exception+0x22/0x310
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? exc_page_fault+0x157/0x180
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? asm_exc_page_fault+0x22/0x30
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? nfsd_cache_lookup+0x3c5/0x770 [nfsd]
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? kfree+0x4b/0x110
> > Jul 19 19:22:43 kdevops-nfsd kernel: nfsd_cache_lookup+0x3c5/0x770 [nfsd]
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd+0x10/0x10 [nfsd]
> > Jul 19 19:22:43 kdevops-nfsd kernel: nfsd_dispatch+0x62/0x1b0 [nfsd]
> > Jul 19 19:22:43 kdevops-nfsd kernel: svc_process_common+0x3cb/0x6c0 [sunrpc]
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd+0x10/0x10 [nfsd]
> > Jul 19 19:22:43 kdevops-nfsd kernel: svc_process+0x12d/0x170 [sunrpc]
> > Jul 19 19:22:43 kdevops-nfsd kernel: nfsd+0xd5/0x1a0 [nfsd]
> > Jul 19 19:22:43 kdevops-nfsd kernel: kthread+0xf3/0x120
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_kthread+0x10/0x10
> > Jul 19 19:22:43 kdevops-nfsd kernel: ret_from_fork+0x30/0x50
> > Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_kthread+0x10/0x10
> > Jul 19 19:22:43 kdevops-nfsd kernel: ret_from_fork_asm+0x1b/0x30
> > Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0000:0x0
> > Jul 19 19:22:43 kdevops-nfsd kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> > Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
> > Jul 19 19:22:43 kdevops-nfsd kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > Jul 19 19:22:43 kdevops-nfsd kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > Jul 19 19:22:43 kdevops-nfsd kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > Jul 19 19:22:43 kdevops-nfsd kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > Jul 19 19:22:43 kdevops-nfsd kernel: </TASK>
> > Jul 19 19:22:43 kdevops-nfsd kernel: Modules linked in: 9p netfs nls_iso8859_1 nls_cp437 vfat fat kvm_intel joydev kvm cirrus psmouse drm_shmem_helper irqbypass pcspkr virtio_net net_failover failover virtio_balloon >
> > Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffd8442d049108
> > Jul 19 19:22:43 kdevops-nfsd kernel: ---[ end trace 0000000000000000 ]---
> > Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0010:kfree+0x4b/0x110
> > Jul 19 19:22:43 kdevops-nfsd kernel: Code: 80 48 01 d8 0f 82 d8 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 9f d7 fb 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 7d d7 fb 00 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 9f 00 00 >
> > Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0018:ffffb0858661bdc0 EFLAGS: 00010286
> > Jul 19 19:22:43 kdevops-nfsd kernel: RAX: ffffd8442d049100 RBX: 0000000081244052 RCX: ffff8a665e003008
> > Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000759a40000000 RSI: ffffffffc0b13a45 RDI: 0000000081244052
> > Jul 19 19:22:43 kdevops-nfsd kernel: RBP: ffffb0858661be40 R08: ffff8a665e003008 R09: ffffffff8ebdc4ec
> > Jul 19 19:22:43 kdevops-nfsd kernel: R10: 000000000000000e R11: 000000000000001b R12: ffff8a6656f20150
> > Jul 19 19:22:43 kdevops-nfsd kernel: R13: ffff8a665e003008 R14: ffff8a6656f20100 R15: ffff8a6656f20980
> > Jul 19 19:22:43 kdevops-nfsd kernel: FS: 0000000000000000(0000) GS:ffff8a67bfc00000(0000) knlGS:0000000000000000
> > Jul 19 19:22:43 kdevops-nfsd kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffffffffffffd6 CR3: 0000000277e1a001 CR4: 0000000000060ef0
> > Jul 19 19:22:43 kdevops-nfsd kernel: note: nfsd[743] exited with irqs disabled
> > Jul 19 19:22:43 kdevops-nfsd kernel: note: nfsd[743] exited with preempt_count 1
> >
> > faddr2line says:
> >
> > $ ./scripts/faddr2line --list fs/nfsd/nfsd.ko nfsd_cache_lookup+0x3c5/0x770
> > nfsd_cacherep_free at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:116
> > 111 }
> > 112
> > 113 static void nfsd_cacherep_free(struct nfsd_cacherep *rp)
> > 114 {
> > 115 kfree(rp->c_replvec.iov_base);
> > > 116< kmem_cache_free(drc_slab, rp);
> > 117 }
> > 118
> > 119 static unsigned long
> > 120 nfsd_cacherep_dispose(struct list_head *dispose)
> > 121 {
> >
> > (inlined by) nfsd_reply_cache_free_locked at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:153
> > 148 static void
> > 149 nfsd_reply_cache_free_locked(struct nfsd_drc_bucket *b, struct nfsd_cacherep *rp,
> > 150 struct nfsd_net *nn)
> > 151 {
> > 152 nfsd_cacherep_unlink_locked(nn, b, rp);
> > > 153< nfsd_cacherep_free(rp);
> > 154 }
> > 155
> > 156 static void
> > 157 nfsd_reply_cache_free(struct nfsd_drc_bucket *b, struct nfsd_cacherep *rp,
> > 158 struct nfsd_net *nn)
> >
> > (inlined by) nfsd_cache_lookup at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:527
> > 522 nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp));
> > 523 goto out;
> > 524
> > 525 found_entry:
> > 526 /* We found a matching entry which is either in progress or done. */
> > > 527< nfsd_reply_cache_free_locked(NULL, rp, nn);
> > 528 nfsd_stats_rc_hits_inc();
> > 529 rtn = RC_DROPIT;
> > 530 rp = found;
> > 531
> > 532 /* Request being processed */
> >
> >
> > ...and a bisect landed here:
> >
> >
> > 767b1d5badd6eb418e3f91f0cd8fa6d2894ff43a is the first bad commit
> > commit 767b1d5badd6eb418e3f91f0cd8fa6d2894ff43a
> > Author: Chuck Lever <[email protected]>
> > Date: Sun Jul 9 11:45:16 2023 -0400
> >
> > NFSD: Refactor nfsd_reply_cache_free_locked()
> >
> > To reduce contention on the bucket locks, we must avoid calling
> > kfree() while each bucket lock is held.
> >
> > Start by refactoring nfsd_reply_cache_free_locked() into a helper
> > that removes an entry from the bucket (and must therefore run under
> > the lock) and a second helper that frees the entry (which does not
> > need to hold the lock).
> >
> > For readability, rename the helpers nfsd_cacherep_<verb>.
> >
> > Reviewed-by: Jeff Layton <[email protected]>
> > Signed-off-by: Chuck Lever <[email protected]>
> >
> > fs/nfsd/nfscache.c | 26 +++++++++++++++++++-------
> > 1 file changed, 19 insertions(+), 7 deletions(-)
> >
> > --
> > Jeff Layton <[email protected]>
>
So far, it seems to be surviving with this patch on top. c_replvec is
part of a union, so you need to ensure it actually holds a kvec.
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 6eb3d7bdfaf3..80621a709510 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -112,7 +112,8 @@ nfsd_cacherep_alloc(struct svc_rqst *rqstp, __wsum csum,
static void nfsd_cacherep_free(struct nfsd_cacherep *rp)
{
- kfree(rp->c_replvec.iov_base);
+ if (rp->c_type == RC_REPLBUFF)
+ kfree(rp->c_replvec.iov_base);
kmem_cache_free(drc_slab, rp);
}
--
2.41.0
> On Jul 20, 2023, at 10:33 AM, Jeff Layton <[email protected]> wrote:
>
> On Wed, 2023-07-19 at 23:46 +0000, Chuck Lever III wrote:
>>
>>> On Jul 19, 2023, at 5:48 PM, Jeff Layton <[email protected]> wrote:
>>>
>>> Hi Chuck,
>>>
>>> While doing some testing today with pynfs on a branch based on your
>>> nfsd-next branch. I'm not sure which test triggers it, but it's one of
>>> the v4.0 tests.
>>
>> I've just started running pynfs on nfsd-next, haven't seen any
>> crashes so far. I'll take a dive in tomorrow.
>>
>>
>>> It only takes a few mins before it crashes:
>>>
>>> Jul 19 19:22:43 kdevops-nfsd kernel: BUG: unable to handle page fault for address: ffffd8442d049108
>>> Jul 19 19:22:43 kdevops-nfsd kernel: #PF: supervisor read access in kernel mode
>>> Jul 19 19:22:43 kdevops-nfsd kernel: #PF: error_code(0x0000) - not-present page
>>> Jul 19 19:22:43 kdevops-nfsd kernel: PGD 0 P4D 0
>>> Jul 19 19:22:43 kdevops-nfsd kernel: Oops: 0000 [#1] PREEMPT SMP PTI
>>> Jul 19 19:22:43 kdevops-nfsd kernel: CPU: 0 PID: 743 Comm: nfsd Not tainted 6.5.0-rc2+ #19
>>> Jul 19 19:22:43 kdevops-nfsd kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0010:kfree+0x4b/0x110
>>> Jul 19 19:22:43 kdevops-nfsd kernel: Code: 80 48 01 d8 0f 82 d8 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 9f d7 fb 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 7d d7 fb 00 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 9f 00 00 >
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0018:ffffb0858661bdc0 EFLAGS: 00010286
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RAX: ffffd8442d049100 RBX: 0000000081244052 RCX: ffff8a665e003008
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000759a40000000 RSI: ffffffffc0b13a45 RDI: 0000000081244052
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RBP: ffffb0858661be40 R08: ffff8a665e003008 R09: ffffffff8ebdc4ec
>>> Jul 19 19:22:43 kdevops-nfsd kernel: R10: 000000000000000e R11: 000000000000001b R12: ffff8a6656f20150
>>> Jul 19 19:22:43 kdevops-nfsd kernel: R13: ffff8a665e003008 R14: ffff8a6656f20100 R15: ffff8a6656f20980
>>> Jul 19 19:22:43 kdevops-nfsd kernel: FS: 0000000000000000(0000) GS:ffff8a67bfc00000(0000) knlGS:0000000000000000
>>> Jul 19 19:22:43 kdevops-nfsd kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffd8442d049108 CR3: 0000000277e1a001 CR4: 0000000000060ef0
>>> Jul 19 19:22:43 kdevops-nfsd kernel: Call Trace:
>>> Jul 19 19:22:43 kdevops-nfsd kernel: <TASK>
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __die+0x1f/0x70
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? page_fault_oops+0x159/0x450
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? fixup_exception+0x22/0x310
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? exc_page_fault+0x157/0x180
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? asm_exc_page_fault+0x22/0x30
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? nfsd_cache_lookup+0x3c5/0x770 [nfsd]
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? kfree+0x4b/0x110
>>> Jul 19 19:22:43 kdevops-nfsd kernel: nfsd_cache_lookup+0x3c5/0x770 [nfsd]
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd+0x10/0x10 [nfsd]
>>> Jul 19 19:22:43 kdevops-nfsd kernel: nfsd_dispatch+0x62/0x1b0 [nfsd]
>>> Jul 19 19:22:43 kdevops-nfsd kernel: svc_process_common+0x3cb/0x6c0 [sunrpc]
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_nfsd+0x10/0x10 [nfsd]
>>> Jul 19 19:22:43 kdevops-nfsd kernel: svc_process+0x12d/0x170 [sunrpc]
>>> Jul 19 19:22:43 kdevops-nfsd kernel: nfsd+0xd5/0x1a0 [nfsd]
>>> Jul 19 19:22:43 kdevops-nfsd kernel: kthread+0xf3/0x120
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_kthread+0x10/0x10
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ret_from_fork+0x30/0x50
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ? __pfx_kthread+0x10/0x10
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ret_from_fork_asm+0x1b/0x30
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0000:0x0
>>> Jul 19 19:22:43 kdevops-nfsd kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>>> Jul 19 19:22:43 kdevops-nfsd kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>> Jul 19 19:22:43 kdevops-nfsd kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>> Jul 19 19:22:43 kdevops-nfsd kernel: </TASK>
>>> Jul 19 19:22:43 kdevops-nfsd kernel: Modules linked in: 9p netfs nls_iso8859_1 nls_cp437 vfat fat kvm_intel joydev kvm cirrus psmouse drm_shmem_helper irqbypass pcspkr virtio_net net_failover failover virtio_balloon >
>>> Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffd8442d049108
>>> Jul 19 19:22:43 kdevops-nfsd kernel: ---[ end trace 0000000000000000 ]---
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RIP: 0010:kfree+0x4b/0x110
>>> Jul 19 19:22:43 kdevops-nfsd kernel: Code: 80 48 01 d8 0f 82 d8 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 9f d7 fb 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 7d d7 fb 00 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 9f 00 00 >
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RSP: 0018:ffffb0858661bdc0 EFLAGS: 00010286
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RAX: ffffd8442d049100 RBX: 0000000081244052 RCX: ffff8a665e003008
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RDX: 0000759a40000000 RSI: ffffffffc0b13a45 RDI: 0000000081244052
>>> Jul 19 19:22:43 kdevops-nfsd kernel: RBP: ffffb0858661be40 R08: ffff8a665e003008 R09: ffffffff8ebdc4ec
>>> Jul 19 19:22:43 kdevops-nfsd kernel: R10: 000000000000000e R11: 000000000000001b R12: ffff8a6656f20150
>>> Jul 19 19:22:43 kdevops-nfsd kernel: R13: ffff8a665e003008 R14: ffff8a6656f20100 R15: ffff8a6656f20980
>>> Jul 19 19:22:43 kdevops-nfsd kernel: FS: 0000000000000000(0000) GS:ffff8a67bfc00000(0000) knlGS:0000000000000000
>>> Jul 19 19:22:43 kdevops-nfsd kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Jul 19 19:22:43 kdevops-nfsd kernel: CR2: ffffffffffffffd6 CR3: 0000000277e1a001 CR4: 0000000000060ef0
>>> Jul 19 19:22:43 kdevops-nfsd kernel: note: nfsd[743] exited with irqs disabled
>>> Jul 19 19:22:43 kdevops-nfsd kernel: note: nfsd[743] exited with preempt_count 1
>>>
>>> faddr2line says:
>>>
>>> $ ./scripts/faddr2line --list fs/nfsd/nfsd.ko nfsd_cache_lookup+0x3c5/0x770
>>> nfsd_cacherep_free at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:116
>>> 111 }
>>> 112
>>> 113 static void nfsd_cacherep_free(struct nfsd_cacherep *rp)
>>> 114 {
>>> 115 kfree(rp->c_replvec.iov_base);
>>>> 116< kmem_cache_free(drc_slab, rp);
>>> 117 }
>>> 118
>>> 119 static unsigned long
>>> 120 nfsd_cacherep_dispose(struct list_head *dispose)
>>> 121 {
>>>
>>> (inlined by) nfsd_reply_cache_free_locked at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:153
>>> 148 static void
>>> 149 nfsd_reply_cache_free_locked(struct nfsd_drc_bucket *b, struct nfsd_cacherep *rp,
>>> 150 struct nfsd_net *nn)
>>> 151 {
>>> 152 nfsd_cacherep_unlink_locked(nn, b, rp);
>>>> 153< nfsd_cacherep_free(rp);
>>> 154 }
>>> 155
>>> 156 static void
>>> 157 nfsd_reply_cache_free(struct nfsd_drc_bucket *b, struct nfsd_cacherep *rp,
>>> 158 struct nfsd_net *nn)
>>>
>>> (inlined by) nfsd_cache_lookup at /home/jlayton/git/kdevops/linux/fs/nfsd/nfscache.c:527
>>> 522 nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp));
>>> 523 goto out;
>>> 524
>>> 525 found_entry:
>>> 526 /* We found a matching entry which is either in progress or done. */
>>>> 527< nfsd_reply_cache_free_locked(NULL, rp, nn);
>>> 528 nfsd_stats_rc_hits_inc();
>>> 529 rtn = RC_DROPIT;
>>> 530 rp = found;
>>> 531
>>> 532 /* Request being processed */
>>>
>>>
>>> ...and a bisect landed here:
>>>
>>>
>>> 767b1d5badd6eb418e3f91f0cd8fa6d2894ff43a is the first bad commit
>>> commit 767b1d5badd6eb418e3f91f0cd8fa6d2894ff43a
>>> Author: Chuck Lever <[email protected]>
>>> Date: Sun Jul 9 11:45:16 2023 -0400
>>>
>>> NFSD: Refactor nfsd_reply_cache_free_locked()
>>>
>>> To reduce contention on the bucket locks, we must avoid calling
>>> kfree() while each bucket lock is held.
>>>
>>> Start by refactoring nfsd_reply_cache_free_locked() into a helper
>>> that removes an entry from the bucket (and must therefore run under
>>> the lock) and a second helper that frees the entry (which does not
>>> need to hold the lock).
>>>
>>> For readability, rename the helpers nfsd_cacherep_<verb>.
>>>
>>> Reviewed-by: Jeff Layton <[email protected]>
>>> Signed-off-by: Chuck Lever <[email protected]>
>>>
>>> fs/nfsd/nfscache.c | 26 +++++++++++++++++++-------
>>> 1 file changed, 19 insertions(+), 7 deletions(-)
>>>
>>> --
>>> Jeff Layton <[email protected]>
>>
>
> So far, it seems to be surviving with this patch on top. c_replvec is
> part of a union, so you need to ensure it actually holds a kvec.
>
>
> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
> index 6eb3d7bdfaf3..80621a709510 100644
> --- a/fs/nfsd/nfscache.c
> +++ b/fs/nfsd/nfscache.c
> @@ -112,7 +112,8 @@ nfsd_cacherep_alloc(struct svc_rqst *rqstp, __wsum csum,
>
> static void nfsd_cacherep_free(struct nfsd_cacherep *rp)
> {
> - kfree(rp->c_replvec.iov_base);
> + if (rp->c_type == RC_REPLBUFF)
> + kfree(rp->c_replvec.iov_base);
> kmem_cache_free(drc_slab, rp);
> }
Agreed, that was an unintended change to the logic introduced by
that commit, and it lines up with the RIP.
--
Chuck Lever