2023-09-27 19:16:04

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] nfs: decrement nrequests counter before releasing the req

On Tue, 2023-09-19 at 09:17 -0400, Jeff Layton wrote:
> I hit this panic in testing:
>
> [ 6235.500016] run fstests generic/464 at 2023-09-18 22:51:24
> [ 6288.410761] BUG: kernel NULL pointer dereference, address:
> 0000000000000000
> [ 6288.412174] #PF: supervisor read access in kernel mode
> [ 6288.413160] #PF: error_code(0x0000) - not-present page
> [ 6288.413992] PGD 0 P4D 0
> [ 6288.414603] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 6288.415419] CPU: 0 PID: 340798 Comm: kworker/u18:8 Not tainted
> 6.6.0-rc1-gdcf620ceebac #95
> [ 6288.416538] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.16.2-1.fc38 04/01/2014
> [ 6288.417701] Workqueue: nfsiod rpc_async_release [sunrpc]
> [ 6288.418676] RIP: 0010:nfs_inode_remove_request+0xc8/0x150 [nfs]
> [ 6288.419836] Code: ff ff 48 8b 43 38 48 8b 7b 10 a8 04 74 5b 48 85
> ff 74 56 48 8b 07 a9 00 00 08 00 74 58 48 8b 07 f6 c4 10 74 50 e8 c8
> 44 b3 d5 <48> 8b 00 f0 48 ff 88 30 ff ff ff 5b 5d 41 5c c3 cc cc cc
> cc 48 8b
> [ 6288.422389] RSP: 0018:ffffbd618353bda8 EFLAGS: 00010246
> [ 6288.423234] RAX: 0000000000000000 RBX: ffff9a29f9a25280 RCX:
> 0000000000000000
> [ 6288.424351] RDX: ffff9a29f9a252b4 RSI: 000000000000000b RDI:
> ffffef41448e3840
> [ 6288.425345] RBP: ffffef41448e3840 R08: 0000000000000038 R09:
> ffffffffffffffff
> [ 6288.426334] R10: 0000000000033f80 R11: ffff9a2a7fffa000 R12:
> ffff9a29093f98c4
> [ 6288.427353] R13: 0000000000000000 R14: ffff9a29230f62e0 R15:
> ffff9a29230f62d0
> [ 6288.428358] FS:  0000000000000000(0000) GS:ffff9a2a77c00000(0000)
> knlGS:0000000000000000
> [ 6288.429513] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6288.430427] CR2: 0000000000000000 CR3: 0000000264748002 CR4:
> 0000000000770ef0
> [ 6288.431553] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 6288.432715] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 6288.433698] PKRU: 55555554
> [ 6288.434196] Call Trace:
> [ 6288.434667]  <TASK>
> [ 6288.435132]  ? __die+0x1f/0x70
> [ 6288.435723]  ? page_fault_oops+0x159/0x450
> [ 6288.436389]  ? try_to_wake_up+0x98/0x5d0
> [ 6288.437044]  ? do_user_addr_fault+0x65/0x660
> [ 6288.437728]  ? exc_page_fault+0x7a/0x180
> [ 6288.438368]  ? asm_exc_page_fault+0x22/0x30
> [ 6288.439137]  ? nfs_inode_remove_request+0xc8/0x150 [nfs]
> [ 6288.440112]  ? nfs_inode_remove_request+0xa0/0x150 [nfs]
> [ 6288.440924]  nfs_commit_release_pages+0x16e/0x340 [nfs]
> [ 6288.441700]  ? __pfx_call_transmit+0x10/0x10 [sunrpc]
> [ 6288.442475]  ? _raw_spin_lock_irqsave+0x23/0x50
> [ 6288.443161]  nfs_commit_release+0x15/0x40 [nfs]
> [ 6288.443926]  rpc_free_task+0x36/0x60 [sunrpc]
> [ 6288.444741]  rpc_async_release+0x29/0x40 [sunrpc]
> [ 6288.445509]  process_one_work+0x171/0x340
> [ 6288.446135]  worker_thread+0x277/0x3a0
> [ 6288.446724]  ? __pfx_worker_thread+0x10/0x10
> [ 6288.447376]  kthread+0xf0/0x120
> [ 6288.447903]  ? __pfx_kthread+0x10/0x10
> [ 6288.448500]  ret_from_fork+0x2d/0x50
> [ 6288.449078]  ? __pfx_kthread+0x10/0x10
> [ 6288.449665]  ret_from_fork_asm+0x1b/0x30
> [ 6288.450283]  </TASK>
> [ 6288.450688] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver nfs lockd grace sunrpc nls_iso8859_1 nls_cp437 vfat fat
> 9p netfs ext4 kvm_intel crc16 mbcache jbd2 joydev kvm xfs irqbypass
> virtio_net pcspkr net_failover psmouse failover 9pnet_virtio cirrus
> drm_shmem_helper virtio_balloon drm_kms_helper button evdev drm loop
> dm_mod zram zsmalloc crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel sha512_ssse3 sha512_generic virtio_blk nvme
> aesni_intel crypto_simd cryptd nvme_core t10_pi i6300esb
> crc64_rocksoft_generic crc64_rocksoft crc64 virtio_pci virtio
> virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring serio_raw
> btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor
> raid6_pq autofs4
> [ 6288.460211] CR2: 0000000000000000
> [ 6288.460787] ---[ end trace 0000000000000000 ]---
> [ 6288.461571] RIP: 0010:nfs_inode_remove_request+0xc8/0x150 [nfs]
> [ 6288.462500] Code: ff ff 48 8b 43 38 48 8b 7b 10 a8 04 74 5b 48 85
> ff 74 56 48 8b 07 a9 00 00 08 00 74 58 48 8b 07 f6 c4 10 74 50 e8 c8
> 44 b3 d5 <48> 8b 00 f0 48 ff 88 30 ff ff ff 5b 5d 41 5c c3 cc cc cc
> cc 48 8b
> [ 6288.465136] RSP: 0018:ffffbd618353bda8 EFLAGS: 00010246
> [ 6288.465963] RAX: 0000000000000000 RBX: ffff9a29f9a25280 RCX:
> 0000000000000000
> [ 6288.467035] RDX: ffff9a29f9a252b4 RSI: 000000000000000b RDI:
> ffffef41448e3840
> [ 6288.468093] RBP: ffffef41448e3840 R08: 0000000000000038 R09:
> ffffffffffffffff
> [ 6288.469121] R10: 0000000000033f80 R11: ffff9a2a7fffa000 R12:
> ffff9a29093f98c4
> [ 6288.470109] R13: 0000000000000000 R14: ffff9a29230f62e0 R15:
> ffff9a29230f62d0
> [ 6288.471106] FS:  0000000000000000(0000) GS:ffff9a2a77c00000(0000)
> knlGS:0000000000000000
> [ 6288.472216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6288.473059] CR2: 0000000000000000 CR3: 0000000264748002 CR4:
> 0000000000770ef0
> [ 6288.474096] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 6288.475097] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 6288.476148] PKRU: 55555554
> [ 6288.476665] note: kworker/u18:8[340798] exited with irqs disabled
>
> Once we've released "req", it's not safe to dereference it anymore.
> Decrement the nrequests counter before dropping the reference.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> I've only hit this once after a lot of testing, so I can't confirm
> that
> this fixes anything. It seems like the right thing to do, however.
> ---
>  fs/nfs/write.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 8c1ee1a1a28f..7720b5e43014 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -802,8 +802,8 @@ static void nfs_inode_remove_request(struct
> nfs_page *req)
>         }
>  
>         if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> -               nfs_release_request(req);
>                 atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> >nrequests);
> +               nfs_release_request(req);
>         }
>  }
>

Isn't this the same issue that Scott Mayhew posted a patch ("NFS: Fix
potential oops in nfs_inode_remove_request()") on July 25th?

At the time I argued for something like the above patch, but both Scott
and you argued that it was insufficient to fix the problem.

So which patch is the one we should apply?

Cheers,
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]



2023-09-27 19:25:45

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH] nfs: decrement nrequests counter before releasing the req

On Wed, 2023-09-27 at 19:13 +0000, Trond Myklebust wrote:
> On Tue, 2023-09-19 at 09:17 -0400, Jeff Layton wrote:
> > I hit this panic in testing:
> >
> > [ 6235.500016] run fstests generic/464 at 2023-09-18 22:51:24
> > [ 6288.410761] BUG: kernel NULL pointer dereference, address:
> > 0000000000000000
> > [ 6288.412174] #PF: supervisor read access in kernel mode
> > [ 6288.413160] #PF: error_code(0x0000) - not-present page
> > [ 6288.413992] PGD 0 P4D 0
> > [ 6288.414603] Oops: 0000 [#1] PREEMPT SMP PTI
> > [ 6288.415419] CPU: 0 PID: 340798 Comm: kworker/u18:8 Not tainted
> > 6.6.0-rc1-gdcf620ceebac #95
> > [ 6288.416538] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > BIOS 1.16.2-1.fc38 04/01/2014
> > [ 6288.417701] Workqueue: nfsiod rpc_async_release [sunrpc]
> > [ 6288.418676] RIP: 0010:nfs_inode_remove_request+0xc8/0x150 [nfs]
> > [ 6288.419836] Code: ff ff 48 8b 43 38 48 8b 7b 10 a8 04 74 5b 48 85
> > ff 74 56 48 8b 07 a9 00 00 08 00 74 58 48 8b 07 f6 c4 10 74 50 e8 c8
> > 44 b3 d5 <48> 8b 00 f0 48 ff 88 30 ff ff ff 5b 5d 41 5c c3 cc cc cc
> > cc 48 8b
> > [ 6288.422389] RSP: 0018:ffffbd618353bda8 EFLAGS: 00010246
> > [ 6288.423234] RAX: 0000000000000000 RBX: ffff9a29f9a25280 RCX:
> > 0000000000000000
> > [ 6288.424351] RDX: ffff9a29f9a252b4 RSI: 000000000000000b RDI:
> > ffffef41448e3840
> > [ 6288.425345] RBP: ffffef41448e3840 R08: 0000000000000038 R09:
> > ffffffffffffffff
> > [ 6288.426334] R10: 0000000000033f80 R11: ffff9a2a7fffa000 R12:
> > ffff9a29093f98c4
> > [ 6288.427353] R13: 0000000000000000 R14: ffff9a29230f62e0 R15:
> > ffff9a29230f62d0
> > [ 6288.428358] FS:? 0000000000000000(0000) GS:ffff9a2a77c00000(0000)
> > knlGS:0000000000000000
> > [ 6288.429513] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 6288.430427] CR2: 0000000000000000 CR3: 0000000264748002 CR4:
> > 0000000000770ef0
> > [ 6288.431553] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 6288.432715] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [ 6288.433698] PKRU: 55555554
> > [ 6288.434196] Call Trace:
> > [ 6288.434667]? <TASK>
> > [ 6288.435132]? ? __die+0x1f/0x70
> > [ 6288.435723]? ? page_fault_oops+0x159/0x450
> > [ 6288.436389]? ? try_to_wake_up+0x98/0x5d0
> > [ 6288.437044]? ? do_user_addr_fault+0x65/0x660
> > [ 6288.437728]? ? exc_page_fault+0x7a/0x180
> > [ 6288.438368]? ? asm_exc_page_fault+0x22/0x30
> > [ 6288.439137]? ? nfs_inode_remove_request+0xc8/0x150 [nfs]
> > [ 6288.440112]? ? nfs_inode_remove_request+0xa0/0x150 [nfs]
> > [ 6288.440924]? nfs_commit_release_pages+0x16e/0x340 [nfs]
> > [ 6288.441700]? ? __pfx_call_transmit+0x10/0x10 [sunrpc]
> > [ 6288.442475]? ? _raw_spin_lock_irqsave+0x23/0x50
> > [ 6288.443161]? nfs_commit_release+0x15/0x40 [nfs]
> > [ 6288.443926]? rpc_free_task+0x36/0x60 [sunrpc]
> > [ 6288.444741]? rpc_async_release+0x29/0x40 [sunrpc]
> > [ 6288.445509]? process_one_work+0x171/0x340
> > [ 6288.446135]? worker_thread+0x277/0x3a0
> > [ 6288.446724]? ? __pfx_worker_thread+0x10/0x10
> > [ 6288.447376]? kthread+0xf0/0x120
> > [ 6288.447903]? ? __pfx_kthread+0x10/0x10
> > [ 6288.448500]? ret_from_fork+0x2d/0x50
> > [ 6288.449078]? ? __pfx_kthread+0x10/0x10
> > [ 6288.449665]? ret_from_fork_asm+0x1b/0x30
> > [ 6288.450283]? </TASK>
> > [ 6288.450688] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> > dns_resolver nfs lockd grace sunrpc nls_iso8859_1 nls_cp437 vfat fat
> > 9p netfs ext4 kvm_intel crc16 mbcache jbd2 joydev kvm xfs irqbypass
> > virtio_net pcspkr net_failover psmouse failover 9pnet_virtio cirrus
> > drm_shmem_helper virtio_balloon drm_kms_helper button evdev drm loop
> > dm_mod zram zsmalloc crct10dif_pclmul crc32_pclmul
> > ghash_clmulni_intel sha512_ssse3 sha512_generic virtio_blk nvme
> > aesni_intel crypto_simd cryptd nvme_core t10_pi i6300esb
> > crc64_rocksoft_generic crc64_rocksoft crc64 virtio_pci virtio
> > virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring serio_raw
> > btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor
> > raid6_pq autofs4
> > [ 6288.460211] CR2: 0000000000000000
> > [ 6288.460787] ---[ end trace 0000000000000000 ]---
> > [ 6288.461571] RIP: 0010:nfs_inode_remove_request+0xc8/0x150 [nfs]
> > [ 6288.462500] Code: ff ff 48 8b 43 38 48 8b 7b 10 a8 04 74 5b 48 85
> > ff 74 56 48 8b 07 a9 00 00 08 00 74 58 48 8b 07 f6 c4 10 74 50 e8 c8
> > 44 b3 d5 <48> 8b 00 f0 48 ff 88 30 ff ff ff 5b 5d 41 5c c3 cc cc cc
> > cc 48 8b
> > [ 6288.465136] RSP: 0018:ffffbd618353bda8 EFLAGS: 00010246
> > [ 6288.465963] RAX: 0000000000000000 RBX: ffff9a29f9a25280 RCX:
> > 0000000000000000
> > [ 6288.467035] RDX: ffff9a29f9a252b4 RSI: 000000000000000b RDI:
> > ffffef41448e3840
> > [ 6288.468093] RBP: ffffef41448e3840 R08: 0000000000000038 R09:
> > ffffffffffffffff
> > [ 6288.469121] R10: 0000000000033f80 R11: ffff9a2a7fffa000 R12:
> > ffff9a29093f98c4
> > [ 6288.470109] R13: 0000000000000000 R14: ffff9a29230f62e0 R15:
> > ffff9a29230f62d0
> > [ 6288.471106] FS:? 0000000000000000(0000) GS:ffff9a2a77c00000(0000)
> > knlGS:0000000000000000
> > [ 6288.472216] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 6288.473059] CR2: 0000000000000000 CR3: 0000000264748002 CR4:
> > 0000000000770ef0
> > [ 6288.474096] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 6288.475097] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [ 6288.476148] PKRU: 55555554
> > [ 6288.476665] note: kworker/u18:8[340798] exited with irqs disabled
> >
> > Once we've released "req", it's not safe to dereference it anymore.
> > Decrement the nrequests counter before dropping the reference.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > I've only hit this once after a lot of testing, so I can't confirm
> > that
> > this fixes anything. It seems like the right thing to do, however.
> > ---
> > ?fs/nfs/write.c | 2 +-
> > ?1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index 8c1ee1a1a28f..7720b5e43014 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -802,8 +802,8 @@ static void nfs_inode_remove_request(struct
> > nfs_page *req)
> > ????????}
> > ?
> > ????????if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
> > -???????????????nfs_release_request(req);
> > ????????????????atomic_long_dec(&NFS_I(nfs_page_to_inode(req))-
> > > nrequests);
> > +???????????????nfs_release_request(req);
> > ????????}
> > ?}
> >
>
> Isn't this the same issue that Scott Mayhew posted a patch ("NFS: Fix
> potential oops in nfs_inode_remove_request()") on July 25th?
>
> At the time I argued for something like the above patch, but both Scott
> and you argued that it was insufficient to fix the problem.
>
> So which patch is the one we should apply?
>

It is the same bug, I had paged that discussion out, and forgotten about
it when I sent the patch (though it did seem vaguely familiar). Now that
I've looked over the code more, I think you're correct and that that is
the best solution. It's possible that there may be some other bug
involved here too, but it seems clearly wrong to be referencing the req
after releasing it.
--
Jeff Layton <[email protected]>